-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.19.0
-
Important
-
None
-
3
-
MCO Sprint 269, MCO Sprint 270
-
2
-
False
-
-
Release Note Not Required
-
In Progress
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
[sig-architecture] platform pods in ns/openshift-cluster-node-tuning-operator should not exit an excessive amount of times
Extreme regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 100.00% to 78.95%.
Sample (being evaluated) Release: 4.19
Start Time: 2025-04-07T00:00:00Z
End Time: 2025-04-14T08:00:00Z
Success Rate: 78.95%
Successes: 15
Failures: 4
Flakes: 0
Base (historical) Release: 4.18
Start Time: 2025-03-15T00:00:00Z
End Time: 2025-04-14T08:00:00Z
Success Rate: 100.00%
Successes: 63
Failures: 0
Flakes: 0
View the test details report for additional context.
This is just one report of many affected pods.
[sig-architecture] platform pods in ns/openshift-cluster-csi-drivers should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-cluster-node-tuning-operator should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-dns should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-e2e-loki should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-image-registry should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-ingress-canary should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-insights should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-machine-config-operator should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-monitoring should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-multus should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-network-operator should not exit an excessive amount of times : [sig-architecture] platform pods in ns/openshift-ovn-kubernetes should not exit an excessive amount of times
Viewing the main 4.19 board, then opening the regressed tests table top right, then filtering on "excessive" shows these failures. They are all techpreview serial jobs.
Each failed test reports:
namespace/openshift-cluster-csi-drivers node/ip-10-0-118-86.us-west-1.compute.internal pod/aws-ebs-csi-driver-node-ltqf7 uid/8f00a4fb-a131-44dd-9889-d86fcfd4fd12 container/csi-driver restarted 4 times at: non-zero exit at 2025-04-13 16:27:13.364043829 +0000 UTC m=+5548.621212562: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 16:30:11.801567727 +0000 UTC m=+5727.058736510: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 17:11:10.301761644 +0000 UTC m=+8185.558930387: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 17:14:20.099871246 +0000 UTC m=+8375.357039979: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running namespace/openshift-cluster-csi-drivers node/ip-10-0-118-86.us-west-1.compute.internal pod/aws-ebs-csi-driver-node-ltqf7 uid/8f00a4fb-a131-44dd-9889-d86fcfd4fd12 container/csi-liveness-probe restarted 4 times at: non-zero exit at 2025-04-13 16:27:13.364046359 +0000 UTC m=+5548.621215092: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 16:30:11.801570427 +0000 UTC m=+5727.058739160: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 17:11:10.301764824 +0000 UTC m=+8185.558933557: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running non-zero exit at 2025-04-13 17:14:20.099873196 +0000 UTC m=+8375.357041929: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted. The container used to be Running
Unclear why these are restarting right now. First failure was April 12, 9:16pm utc, since then 4 of 7 runs have hit this.
- impacts account
-
TRT-2082 Multiple pods exiting an excessive amount of times on techpreview serial
-
- Closed
-
- is depended on by
-
MCO-1520 [API 5/6] Create 5 tests in openshift/origin for GA readiness signal
-
- Closed
-
- relates to
-
TRT-2083 Introduce disruption/slow suite and jobs
-
- New
-
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update