Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.20
Component/s: Test Framework
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    It appears that a problem in fetching the microshift-version configmap (early in the test suite setup) causes the entire thread of parallel tests to be marked fail with the suite setup failure.

This is creating fake results for tests that have nothing to do with this part of the suite setup and these failures should either be considered not failures (?) or somehow filtered so that they don't appear in component readiness.

Slack thread: https://redhat-internal.slack.com/archives/C01CQA76KMX/p1747653530945769?thread_ts=1747390032.669289&cid=C01CQA76KMX

I0518 16:53:52.374581   83074 test_setup.go:94] Extended test version 4.20.0-202505170601.p2.g7aea48c.assembly.stream.el9-7aea48c
  I0518 16:53:52.374606   83074 test_context.go:558] Tolerating taints "node-role.kubernetes.io/control-plane" when considering if nodes are ready
  I0518 16:54:52.412479 83074 framework.go:2313] error accessing microshift-version configmap: Get "https://api.ostest.test.metalkube.org:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": context deadline exceeded
error: Get "https://api.ostest.test.metalkube.org:6443/api/v1/namespaces/kube-public/configmaps/microshift-version": context deadline exceeded

Example job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6-techpreview/1924105652884475904

Searching the JUnit for "microshift-version configmap not found" yields 74 test failures: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.20-e2e-metal-ipi-ovn-ipv6-techpreview/1924105652884475904/artifacts/e2e-metal-ipi-ovn-ipv6-techpreview/baremetalds-e2e-test/artifacts/junit/junit_e2e__20250518-160410.xml

In the above XML, there are many more failures that all seem to be a communication failures in suite setup, but importantly, prior to the test itself actually executing.

E.g. https://github.com/openshift/origin/blob/7aea48cab7d448568dd9edcf757d269af92adb2b/test/extended/networking/route_advertisements.go#L100 fail, and get a negative impact on component readiness, even though they are nothing to do with the suite setup failure.

Suspicion is:
* Ginkgo running tests in parallel and splitting the tests into many threads
* In some cases, 1 thread would have a connectivity issue to the API server for some reason
* This connectivity issue would manifest as the MicroShift error we are seeing here
* All tests that were assigned to that thread now fail with the MicroShift error

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

Expected results:

Additional info:

is related to

OCPBUGS-56921 Mass In-cluster Disruption / Test Failures

Verified

Assignee:: Devan Goodwin

Reporter:: Joel Speed

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/05/27 11:06 AM

Updated:: 2025/07/12 1:28 PM

Resolved:: 2025/06/16 1:29 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates