-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
[sig-arch] events should not repeat pathologically for ns/openshift-monitoring
Significant regression detected.
Fishers Exact probability of a regression: 99.99%.
Test pass rate dropped from 100.00% to 92.31%.
Sample (being evaluated) Release: 4.20
Start Time: 2025-09-26T00:00:00Z
End Time: 2025-10-03T08:00:00Z
Success Rate: 92.31%
Successes: 36
Failures: 3
Flakes: 0
Base (historical) Release: 4.18
Start Time: 2025-01-26T00:00:00Z
End Time: 2025-02-25T00:00:00Z
Success Rate: 100.00%
Successes: 145
Failures: 0
Flakes: 0
View the test details report for additional context.
The failure happens in other configurations but it's quite rare overall, so we haven't really seen this. Today it popped up in this specific metal report as it happened to hit the min 3 times.
Error message is:
[sig-arch] events should not repeat pathologically for ns/openshift-monitoring expand_less 0s
{ 1 events happened too frequently
event happened 25 times, something is wrong: namespace/openshift-monitoring node/worker-0 pod/prometheus-k8s-0 hmsg/357171899f - reason/Unhealthy Readiness probe errored: rpc error: code = Unknown desc = command error: cannot register an exec PID: container is stopping, stdout: , stderr: , exit code -1 (12:24:14Z) result=reject }
And it appears it happens just after the monitoring operator is upgrading, see this chart.
Note that this test is intended to protect the API server.
Global test analysis can be used to find failures in all jobs, and search ci can show these specific failures over the past two days. Quite common globally.
Filed by: dgoodwin@redhat.com