-
Bug
-
Resolution: Done
-
Undefined
-
Logging 5.5.0
-
False
-
None
-
False
-
NEW
-
OBSDA-115 - Create alerting rules based on logs
-
VERIFIED
-
Log Storage - Sprint 226
LokiRuler pods are showing Evaluating rule failure after alerting and recording rules are created for App and Infra tenants.
Error:
level=warn ts=2022-07-12T23:39:47.953761624Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.131.0.25:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:39:47.954127337Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.129.2.15:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=info ts=2022-07-12T23:41:02.380901027Z caller=metrics.go:122 component=ruler org_id=application latency=fast query="(count_over_time({kubernetes_namespace_name=\"my-user-workload\", kubernetes_pod_name=~\"centos-logtest.*\"}[2m]) > 10)" query_type=metric range_type=instant length=0s step=0s duration=673.313µs status=500 limit=0 returned_lines=0 throughput=0B total_bytes=0B queue_time=0s subqueries=1 level=warn ts=2022-07-12T23:41:02.380948952Z caller=manager.go:610 user=application group=HighAppLogsToLoki2m msg="Evaluating rule failed" rule="record: loki:operator:applogs:rate2m\nexpr: (count_over_time({kubernetes_namespace_name=\"my-user-workload\", kubernetes_pod_name=~\"centos-logtest.*\"}[2m])\n > 10)\n" err="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:41:02.952428483Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.129.2.15:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received" level=warn ts=2022-07-12T23:41:02.952475863Z caller=pool.go:184 msg="removing ingester failing healthcheck" addr=10.131.0.25:9095 reason="rpc error: code = Unavailable desc = connection closed before server preface received"
Steps to reproduce:
1) Deploy LokiOperator and create bucket secret and LokiStack CR
LokiStack CR:
spec: size: 1x.small storage: schemas: - version: v12 effectiveDate: 2022-06-01 secret: name: test type: s3 storageClassName: gp2 tenants: mode: openshift-logging rules: enabled: true selector: matchLabels: openshift.io/cluster-monitoring: "true" namespaceSelector: matchLabels: openshift.io/cluster-monitoring: "true"
2) Validate that LokiRuler pods are up and running.
[kbharti@cube hack]$ oc get pods | grep ruler lokistack-dev-ruler-0 1/1 Running 0 3h47m lokistack-dev-ruler-1 1/1 Running 0 3h47m
3) Deploy Application in my-user-workload namespace with openshift.io/cluster-monitoring: 'true' label on namespace.
4) Create Alerting and recording rules
Application alerting and recording rules: http://pastebin.test.redhat.com/1062953
Infra alerting and recording rules: http://pastebin.test.redhat.com/1062954
5) Validate Loki ruler config map for data.
6) Check logs on Loki Ruler pods.
Expected Result: Rules should create successfully and ruler pods should restart without any error
Actual Result: Error is seen on Loki Ruler pods.