-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.22
-
None
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Approved
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Below is a human-written content
There seems to be an intermittent but regular set of UDN network segmentation failures on 4.22 RHCOS 10. One example of such a job is https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview/2018465846719942656
More details to follow. Don't know anything more yet except the fact that it happens at random times, then succeeds a few times, then again fails.
What is interesting is that this always seems to fail together with
[sig-node][apigroup:config.openshift.io] CPU Partitioning node validation should have correct cpuset and cpushare set in crio containers [Suite:openshift/conformance/parallel]
EDIT 1
https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview/2018518738810179584 is a much calmer CI run which exposes the same issue. The one I pasted above has some Node NotReady during the test run.
EDIT 2
For a moment we thought it's caused/related to the CPU partitioning bug in the RT kernel but there is a test run that failed on non-RT kernel - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview/2016570724088549376
Below is default sippy content
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
[sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using NetworkAttachmentDefinitions is isolated from the default network with L3 primary UDN [Suite:openshift/conformance/parallel]
Significant regression detected.
Fishers Exact probability of a regression: 99.99%.
Test pass rate dropped from 100.00% to 90.24%.
Sample (being evaluated) Release: 4.22
Start Time: 2026-01-27T00:00:00Z
End Time: 2026-02-03T04:00:00Z
Success Rate: 90.24%
Successes: 36
Failures: 4
Flakes: 1
Base (historical) Release: 4.21
Start Time: 2026-01-04T00:00:00Z
End Time: 2026-02-03T04:00:00Z
Success Rate: 100.00%
Successes: 85
Failures: 0
Flakes: 0
View the test details report for additional context.
Below is an AI-generated description
Sippy AI-assisted description; please review details for accuracy.
Filed from: Test Regression Details
Test Name
[sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using NetworkAttachment Definitions is isolated from the default network with L3 primary UDN [Suite:openshift/conformance/parallel]
Brief Overview
Significant regression detected. Fishers Exact probability of a regression: 99.99%. Test pass rate dropped from 100.00% to 90.24%.
Statistics Section
Release: 4.22
Time Period: 2026-01-27T00:00:00Z to 2026-02-03T04:00:00Z
Success Rate: 90.24%
Successes: 36
Failures: 4
Flakes: 1
Release: 4.21
Time Period: 2026-01-04T00:00:00Z to 2026-02-03T04:00:00Z
Success Rate: 100.00%
Successes: 85
Failures: 0
Flakes: 0
Sample Failure Outputs
*Job Run ID: 2016329483056844800* Error executing test process: wrapped process failed: exit status 1 Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test' *Job Run ID: 2016823463020335104* Error executing test process: wrapped process failed: exit status 124 Process did not finish before 10m0s timeout Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test' *Job Run ID: 2017360593417146368* Error executing test process: wrapped process failed: exit status 1 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to find runtime handler test-handler from runtime list Readiness probe failed: Get "http://10.131.0.25:81/": dial tcp 10.131.0.25:81: connect: connection refused Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test' *Job Run ID: 2018465846719942656* Error executing test process: wrapped process failed: exit status 1 Readiness probe failed: Get "http://10.131.0.17:81/": dial tcp 10.131.0.17:81: connect: connection refused Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test'
Links to Relevant Jobs
- periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview
- periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview
- periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview
- periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview
Patterns and Insights
The test has regressed significantly, with its success rate dropping from 100% in the base release (4.21) to 90.24% in the sample release (4.22). The failures appear to be consistent across multiple job runs, all stemming from the `periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview` job.
Common error messages include "Error executing test process" and "wrapped process failed: exit status 1" or "exit status 124". Several failures also show "Readiness probe failed: ... connect: connection refused", indicating potential networking or pod readiness issues. One instance also reported a timeout and a failure to find a runtime handler for a pod sandbox. This suggests a systemic issue affecting test execution or the underlying environment, possibly related to network connectivity, container runtime, or resource availability during test execution. The presence of a flake in the sample period further indicates instability.
Filed by: mkowalsk@redhat.com
- relates to
-
OCPBUGS-48320 [sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using UserDefinedNetwork is isolated from the default network with L2 primary UDN
-
- Closed
-