Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.19.0
Affects Version/s: 4.19.0
Component/s: Machine Config Operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Important
Regression:
None
Epic Link:
Machine Config Node

Target Backport Versions:
None
Target Version:

4.19.0
Release Blocker:
None
Sprint:
MCO Sprint 269, MCO Sprint 270
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Release Note Not Required
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:

[sig-architecture] platform pods in ns/openshift-cluster-node-tuning-operator should not exit an excessive amount of times

Extreme regression detected.
Fishers Exact probability of a regression: 100.00%.
Test pass rate dropped from 100.00% to 78.95%.

Sample (being evaluated) Release: 4.19
Start Time: 2025-04-07T00:00:00Z
End Time: 2025-04-14T08:00:00Z
Success Rate: 78.95%
Successes: 15
Failures: 4
Flakes: 0

Base (historical) Release: 4.18
Start Time: 2025-03-15T00:00:00Z
End Time: 2025-04-14T08:00:00Z
Success Rate: 100.00%
Successes: 63
Failures: 0
Flakes: 0

View the test details report for additional context.

This is just one report of many affected pods.

[sig-architecture] platform pods in ns/openshift-cluster-csi-drivers should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-cluster-node-tuning-operator should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-dns should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-e2e-loki should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-image-registry should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-ingress-canary should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-insights should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-machine-config-operator should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-monitoring should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-multus should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-network-operator should not exit an excessive amount of times
: [sig-architecture] platform pods in ns/openshift-ovn-kubernetes should not exit an excessive amount of times

Viewing the main 4.19 board, then opening the regressed tests table top right, then filtering on "excessive" shows these failures. They are all techpreview serial jobs.

Each failed test reports:

namespace/openshift-cluster-csi-drivers node/ip-10-0-118-86.us-west-1.compute.internal pod/aws-ebs-csi-driver-node-ltqf7 uid/8f00a4fb-a131-44dd-9889-d86fcfd4fd12 container/csi-driver restarted 4 times at:
non-zero exit at 2025-04-13 16:27:13.364043829 +0000 UTC m=+5548.621212562: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 16:30:11.801567727 +0000 UTC m=+5727.058736510: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 17:11:10.301761644 +0000 UTC m=+8185.558930387: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 17:14:20.099871246 +0000 UTC m=+8375.357039979: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running

namespace/openshift-cluster-csi-drivers node/ip-10-0-118-86.us-west-1.compute.internal pod/aws-ebs-csi-driver-node-ltqf7 uid/8f00a4fb-a131-44dd-9889-d86fcfd4fd12 container/csi-liveness-probe restarted 4 times at:
non-zero exit at 2025-04-13 16:27:13.364046359 +0000 UTC m=+5548.621215092: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 16:30:11.801570427 +0000 UTC m=+5727.058739160: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 17:11:10.301764824 +0000 UTC m=+8185.558933557: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running
non-zero exit at 2025-04-13 17:14:20.099873196 +0000 UTC m=+8375.357041929: cause/ContainerStatusUnknown code/137 reason/ContainerExit The container could not be located when the pod was deleted.  The container used to be Running

Unclear why these are restarting right now. First failure was April 12, 9:16pm utc, since then 4 of 7 runs have hit this.

impacts account

TRT-2082 Multiple pods exiting an excessive amount of times on techpreview serial

Closed

is depended on by

MCO-1520 [API 5/6] Create 5 tests in openshift/origin for GA readiness signal

Closed

relates to

TRT-2083 Introduce disruption/slow suite and jobs

links to

openshift/origin#29683: OCPBUGS-54951: MCO-1520: Reintroduction of MachineConfigNode e2e tests

RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update

Assignee:: Isabella Janssen

Reporter:: Devan Goodwin

Need Info From:: None

Contributors:: None

QA Contact:: Prachiti Talgulkar

Doc Contact:: None

Votes:: 1 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2025/04/14 12:01 PM

Updated:: 2025/07/14 1:14 PM

Resolved:: 2025/06/17 4:55 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide