Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.22
Component/s: Networking / ovn-kubernetes
Labels:

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.22
Release Blocker:
Approved
Sprint:
CORENET Sprint 285
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Below is a human-written content

There seems to be an intermittent but regular set of UDN network segmentation failures on 4.22 RHCOS 10. One example of such a job is https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview/2018465846719942656

More details to follow. Don't know anything more yet except the fact that it happens at random times, then succeeds a few times, then again fails.

What is interesting is that this always seems to fail together with

[sig-node][apigroup:config.openshift.io] CPU Partitioning node validation should have correct cpuset and cpushare set in crio containers [Suite:openshift/conformance/parallel]

EDIT 1

https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview/2018518738810179584 is a much calmer CI run which exposes the same issue. The one I pasted above has some Node NotReady during the test run.

EDIT 2

For a moment we thought it's caused/related to the CPU partitioning bug in the RT kernel but there is a test run that failed on non-RT kernel - https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.22-e2e-azure-ovn-rhcos10-techpreview/2016570724088549376

Below is default sippy content

(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:

[sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using NetworkAttachmentDefinitions is isolated from the default network with L3 primary UDN [Suite:openshift/conformance/parallel]

Significant regression detected.
Fishers Exact probability of a regression: 99.99%.
Test pass rate dropped from 100.00% to 90.24%.

Sample (being evaluated) Release: 4.22
Start Time: 2026-01-27T00:00:00Z
End Time: 2026-02-03T04:00:00Z
Success Rate: 90.24%
Successes: 36
Failures: 4
Flakes: 1
Base (historical) Release: 4.21
Start Time: 2026-01-04T00:00:00Z
End Time: 2026-02-03T04:00:00Z
Success Rate: 100.00%
Successes: 85
Failures: 0
Flakes: 0

View the test details report for additional context.

Below is an AI-generated description

⚠️ AI-Generated Content

Sippy AI-assisted description; please review details for accuracy.

Filed from: Test Regression Details

Test Name

[sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using NetworkAttachment
Definitions is isolated from the default network with L3 primary UDN [Suite:openshift/conformance/parallel]

Brief Overview

Significant regression detected. Fishers Exact probability of a regression: 99.99%. Test pass rate dropped from 100.00% to 90.24%.

Statistics Section

Sample (being evaluated)

Release: 4.22
Time Period: 2026-01-27T00:00:00Z to 2026-02-03T04:00:00Z
Success Rate: 90.24%
Successes: 36
Failures: 4
Flakes: 1

Base (historical)

Release: 4.21
Time Period: 2026-01-04T00:00:00Z to 2026-02-03T04:00:00Z
Success Rate: 100.00%
Successes: 85
Failures: 0
Flakes: 0

Sample Failure Outputs

Unable to find source-code formatter for language: text. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml

*Job Run ID: 2016329483056844800*
Error executing test process: wrapped process failed: exit status 1
Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test'

*Job Run ID: 2016823463020335104*
Error executing test process: wrapped process failed: exit status 124
Process did not finish before 10m0s timeout
Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test'

*Job Run ID: 2017360593417146368*
Error executing test process: wrapped process failed: exit status 1
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to find runtime handler test-handler from runtime list
Readiness probe failed: Get "http://10.131.0.25:81/": dial tcp 10.131.0.25:81: connect: connection refused
Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test'

*Job Run ID: 2018465846719942656*
Error executing test process: wrapped process failed: exit status 1
Readiness probe failed: Get "http://10.131.0.17:81/": dial tcp 10.131.0.17:81: connect: connection refused
Reporting job state 'failed' with reason 'executing_graph:step_failed:utilizing_lease:executing_test:executing_multi_stage_test'

Links to Relevant Jobs

Patterns and Insights

The test has regressed significantly, with its success rate dropping from 100% in the base release (4.21) to 90.24% in the sample release (4.22). The failures appear to be consistent across multiple job runs, all stemming from the `periodic-ci-openshift-release-master-nightly-4.22-e2e-gcp-ovn-rt-rhcos10-techpreview` job.

Common error messages include "Error executing test process" and "wrapped process failed: exit status 1" or "exit status 124". Several failures also show "Readiness probe failed: ... connect: connection refused", indicating potential networking or pod readiness issues. One instance also reported a timeout and a failure to find a runtime handler for a pod sandbox. This suggests a systemic issue affecting test execution or the underlying environment, possibly related to network connectivity, container runtime, or resource availability during test execution. The presence of a flake in the sample period further indicates instability.

Filed by: mkowalsk@redhat.com

is duplicated by

OCPBUGS-76612 Component Readiness: [Networking / ovn-kubernetes] [OCPFeatureGate:NetworkSegmentation] test regressed

Closed

relates to

OCPBUGS-48320 [sig-network][OCPFeatureGate:NetworkSegmentation][Feature:UserDefinedPrimaryNetworks] when using openshift ovn-kubernetes created using UserDefinedNetwork is isolated from the default network with L2 primary UDN

Closed

links to

openshift/origin#30854: OCPBUGS-74973: Use Eventually for case that may not work right away

Details

Description

Below is a human-written content

EDIT 1

EDIT 2

Below is default sippy content

Below is an AI-generated description

Test Name

Brief Overview

Statistics Section

Sample Failure Outputs

Links to Relevant Jobs

Patterns and Insights

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates