-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Yesterday a major DPCR and thus Loki outage took the system down entirely. One test would fail as a result:
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
[ { "metric": { "__name__": "ALERTS", "alertname": "KubeDaemonSetRolloutStuck", "alertstate": "firing", "container": "kube-rbac-proxy-main", "daemonset": "loki-promtail", "endpoint": "https-main", "job": "kube-state-metrics", "namespace": "openshift-e2e-loki", "prometheus": "openshift-monitoring/k8s", "service": "kube-state-metrics", "severity": "warning" }, "value": [ 1709071917.851, "1" ] } ]
The query this test uses should be adapted to omit everything in openshift-e2e-loki.
Ideally, backports would be good here, but we could just fix it going forward also if this is too cumbersome.