-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.20
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
It appears that a problem in fetching the microshift-version configmap (early in the test suite setup) causes the entire thread of parallel tests to be marked fail with the suite setup failure. This is creating fake results for tests that have nothing to do with this part of the suite setup and these failures should either be considered not failures (?) or somehow filtered so that they don't appear in component readiness. Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1747653530945769?thread_ts=1747390032.669289&cid=C01CQA76KMX I0518 16:53:52.374581 83074 test_setup.go:94] Extended test version 4.20.0-202505170601.p2.g7aea48c.assembly.stream.el9-7aea48c I0518 16:53:52.374606 83074 test_context.go:558] Tolerating taints "node-role.kubernetes.io/control-plane" when considering if nodes are ready I0518 16:54:52.412479 83074 framework.go:2313] error accessing microshift-version configmap: Get "https://api.ostest.test.metalkube.org:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": context deadline exceeded error: Get "https://api.ostest.test.metalkube.org:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": context deadline exceeded Example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6-techpreview/1924105652884475904 Searching the JUnit for "microshift-version configmap not found" yields 74 test failures: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6-techpreview/1924105652884475904/artifacts/e2e-metal-ipi-ovn-ipv6-techpreview/baremetalds-e2e-test/artifacts/junit/junit_e2e__20250518-160410.xml In the above XML, there are many more failures that all seem to be a communication failures in suite setup, but importantly, prior to the test itself actually executing. E.g. https://github.com/openshift/origin/blob/7aea48cab7d448568dd9edcf757d269af92adb2b/test/extended/networking/route_advertisements.go#L100 fail, and get a negative impact on component readiness, even though they are nothing to do with the suite setup failure. Suspicion is: * Ginkgo running tests in parallel and splitting the tests into many threads * In some cases, 1 thread would have a connectivity issue to the API server for some reason * This connectivity issue would manifest as the MicroShift error we are seeing here * All tests that were assigned to that thread now fail with the MicroShift error
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- is related to
-
OCPBUGS-56921 Mass In-cluster Disruption / Test Failures
-
- Verified
-