Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2822

Evaluating rule failure in LokiRuler pods for Alerting and recording rules

    XMLWordPrintable

Details

    • False
    • None
    • False
    • NEW
    • OBSDA-115 - Create alerting rules based on logs
    • VERIFIED
    • Log Storage - Sprint 226

    Description

      LokiRuler pods are showing Evaluating rule failure after alerting and recording rules are created for App and Infra tenants.

      Error:

      level=warn ts=2022-07-12T23:39:47.953761624Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.131.0.25:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:39:47.954127337Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.129.2.15:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=info ts=2022-07-12T23:41:02.380901027Z caller=metrics.go:122 component=ruler org_id=application latency=fast query="(count_over_time({kubernetes_namespace_name=\"my-user-workload\", kubernetes_pod_name=~\"centos-logtest.*\"}[2m]) > 10)" query_type=metric range_type=instant length=0s step=0s duration=673.313µs status=500 limit=0 returned_lines=0 throughput=0B total_bytes=0B queue_time=0s subqueries=1 level=warn ts=2022-07-12T23:41:02.380948952Z caller=manager.go:610 user=application group=HighAppLogsToLoki2m msg="Evaluating rule failed" rule="record: loki:operator:applogs:rate2m\nexpr: (count_over_time({kubernetes_namespace_name=\"my-user-workload\", kubernetes_pod_name=~\"centos-logtest.*\"}[2m])\n > 10)\n" err="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:41:02.952428483Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.129.2.15:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:41:02.952475863Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.131.0.25:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received"

      Steps to reproduce:

      1) Deploy LokiOperator and create bucket secret and LokiStack CR

      LokiStack CR:

      spec:
        size: 1x.small
        storage:
          schemas:
          - version: v12
            effectiveDate: 2022-06-01
          secret:
            name: test
            type: s3
        storageClassName: gp2
        tenants:
          mode: openshift-logging
        rules:
          enabled: true
          selector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"
          namespaceSelector:
            matchLabels:
              openshift.io/cluster-monitoring: "true"
       
      

      2) Validate that LokiRuler pods are up and running.

      [kbharti@cube hack]$ oc get pods  | grep ruler
      lokistack-dev-ruler-0                           1/1     Running   0          3h47m
      lokistack-dev-ruler-1                           1/1     Running   0          3h47m
      

      3) Deploy Application in my-user-workload namespace with openshift.io/cluster-monitoring: 'true' label on namespace.

      4) Create Alerting and recording rules

      Application alerting and recording rules: http://pastebin.test.redhat.com/1062953

      Infra alerting and recording rules: http://pastebin.test.redhat.com/1062954

      5) Validate Loki ruler config map for data.

      6) Check logs on Loki Ruler pods.

      Expected Result: Rules should create successfully and ruler pods should restart without any error

      Actual Result: Error is seen on Loki Ruler pods.

      Attachments

        Activity

          People

            rh-ee-mbouqsim Mohamed-Amine Bouqsimi (Inactive)
            rhn-support-kbharti Kabir Bharti
            Kabir Bharti Kabir Bharti
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: