-
Bug
-
Resolution: Done-Errata
-
Major
-
4.16.0, 4.17
Description of problem:
Component Readiness reveals a potential regression with the following test:
[sig-node][invariant] alert/TargetDown should not be at or above info in ns/kube-system
Currently the test details link is showing 3 recent failures similar to the following:
Jun 12 07:48:09.154 - 58s W namespace/kube-system alert/TargetDown alertstate/firing severity/warning ALERTS
Unknown macro: {alertname="TargetDown", alertstate="firing", job="kubelet", namespace="kube-system", prometheus="openshift-monitoring/k8s", service="kubelet", severity="warning"}}
This test ran 139 times in the 4 weeks before 4.15 GA and never failed once. It's failed 3 out of 31 times in the last week (with all the failures since yesterday).
Relevant slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1718200183009099
It seems that /metrics and /metrics/cadvisor endpoint fell over, and later recovered
- clones
-
OCPBUGS-35371 Kubelet metrics endpoints experiencing prolonged outages
-
- Closed
-
- depends on
-
OCPBUGS-35371 Kubelet metrics endpoints experiencing prolonged outages
-
- Closed
-
- is cloned by
-
OCPBUGS-57289 [4.17] Kubelet metrics endpoints experiencing prolonged outages
-
- Closed
-
- is depended on by
-
OCPBUGS-57289 [4.17] Kubelet metrics endpoints experiencing prolonged outages
-
- Closed
-
- links to
-
RHBA-2025:9269 OpenShift Container Platform 4.18.18 bug fix update