-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
Yes
-
None
-
Approved
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
(Feel free to update this bug's summary to be more specific.)
We have a complex situation where a small number of 3-4 metal ipv6 jobs have failed upgrade. This has caused a number of test failures to regress in Component Readiness, causing close to 15 regressions at a time we desperately need stability.
The following tests are related:
[sig-storage] [sig-api-machinery] secret-upgrade
[sig-arch][Feature:ClusterUpgrade] Cluster should remain functional during upgrade [Disruptive] [Serial]
[sig-network] can collect network.openshift.io/disruption-actor=poller,network.openshift.io/disruption-target=host-to-pod poller pod logs
[sig-network] can collect network.openshift.io/disruption-actor=poller,network.openshift.io/disruption-target=pod-to-service poller pod logs
[sig-network] can collect network.openshift.io/disruption-actor=poller,network.openshift.io/disruption-target=host-to-service poller pod logs
[sig-network] can collect network.openshift.io/disruption-actor=poller,network.openshift.io/disruption-target=pod-to-pod poller pod logs
[sig-network-edge] Verify DNS availability during and after upgrade success
[Jira:"Network / ovn-kubernetes"] monitor test pod-network-avalibility setup
[sig-apps] replicaset-upgrade
[sig-apps] job-upgrade
[sig-apps] daemonset-upgrade
[sig-apps] deployment-upgrade
[sig-storage] [sig-api-machinery] configmap-upgrade
[sig-storage] [sig-api-machinery] secret-upgrade
Significant regression detected.
Fishers Exact probability of a regression: 99.99%.
Test pass rate dropped from 100.00% to 93.88%.
Sample (being evaluated) Release: 4.19
Start Time: 2025-04-21T00:00:00Z
End Time: 2025-04-28T12:00:00Z
Success Rate: 93.88%
Successes: 46
Failures: 3
Flakes: 0
Base (historical) Release: 4.18
Start Time: 2025-01-26T00:00:00Z
End Time: 2025-02-25T23:59:59Z
Success Rate: 100.00%
Successes: 185
Failures: 0
Flakes: 0
View the test details report for additional context.
The job runs I see causing this:
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-upgrade-ovn-ipv6/1916341644907515904
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-upgrade-ovn-ipv6/1915775815010750464
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-upgrade-runc/1916679378356408320
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-metal-ipi-ovn-upgrade-runc/1915592257860276224
These may be failing due to mirroring issues:
shows:
time="2025-04-27T05:55:36Z" level=info msg="event interval matches E2EImagePullBackOff" locator="
{Kind map[hmsg:0aee5d1c29 namespace:e2e-k8s-sig-apps-job-upgrade-8966 node:worker-0.ostest.test.metalkube.org pod:foo-wb247]}" message="
{BackOff Back-off pulling image \"virthost.ostest.test.metalkube.org:5000/localimages/local-test-image:e2e-7-registry-k8s-io-e2e-test-images-busybox-1-36-1-1-n3BezCOfxp98l84K\" map[firstTimestamp:2025-04-27T05:55:36Z lastTimestamp:2025-04-27T05:55:36Z reason:BackOff]}"
Most of these tests involve pulling images, so I'm suspicious the whole batch could be caused by mirroring problems, but the new mirroring test does not appear to be complaining. (which was added in https://issues.redhat.com/browse/OCPBUGS-48630 )
In any case, this is a mass of regressions on component readiness, which is a release blocker. For whatever reason, upgrade does not look sufficiently stable to compare to past releases and must be fixed. These may roll off eventually but the failure pattern is consistent and it thus could return at any time.
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update