-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Yesterday a major DPCR and thus Loki outage took the system down entirely. One test would fail as a result:
[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]
[
{
"metric": {
"__name__": "ALERTS",
"alertname": "KubeDaemonSetRolloutStuck",
"alertstate": "firing",
"container": "kube-rbac-proxy-main",
"daemonset": "loki-promtail",
"endpoint": "https-main",
"job": "kube-state-metrics",
"namespace": "openshift-e2e-loki",
"prometheus": "openshift-monitoring/k8s",
"service": "kube-state-metrics",
"severity": "warning"
},
"value": [
1709071917.851,
"1"
]
}
]
The query this test uses should be adapted to omit everything in openshift-e2e-loki.
Ideally, backports would be good here, but we could just fix it going forward also if this is too cumbersome.