-
Ticket
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
: [Monitor:legacy-test-framework-invariants-alerts][Unknown][invariant] alert/KubePodNotReady should not be at or above info in ns/default
seems to have regressed in CI aggregated jobs with details like
{ KubePodNotReady was at or above info for at least 4m0s on platformidentification.JobType{Release:"4.22", FromRelease:"4.21", Platform:"azure", Architecture:"amd64", Network:"ovn", Topology:"ha"} (maxAllowed=0s): pending for 1m56s, firing for 4m0s:
Feb 26 10:11:42.563 - 240s W namespace/default pod/verify-all-openshiftredhatoperators-r5d87-96xh7 alert/KubePodNotReady alertstate/firing severity/warning ALERTS{alertname="KubePodNotReady", alertstate="firing", namespace="default", pod="verify-all-openshiftredhatoperators-r5d87-96xh7", prometheus="openshift-monitoring/k8s", severity="warning"}}
example aggregated fail: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/aggregated-aws-ovn-upgrade-4.22-minor-release-openshift-release-analysis-aggregator/2026880209889792000
example individual hit: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-[]…from-stable-4.21-e2e-azure-ovn-upgrade/2026898179982626816
looks like something monitoring or maybe OLM, maybe https://github.com/openshift/operator-framework-operator-controller/pull/638
Slack: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1772104070978709
- links to