-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.20.z, 4.21.0, 4.22.0
Description of problem:
EgressIP Test failures in 4.20, 4.21 && 4.22 first seen in payloads
4.20.0-0.nightly-2026-01-09-015458
4.21.0-0.nightly-2026-01-09-013516
4.22.0-0.nightly-2026-01-09-023214
We do not see a common code change across those releases but the payloads did contain rebuilt images that we believe included a go bump

We saw a similar issue in OCPBUGS-72411 where a CVE fix in go exposed an existing issue within CNO that worked prior to the CVE update.
weliang1@redhat.com provided the following analysis of the failures
RCA on one 4.21 job:
Executive Summary
Four EgressIP test cases failed because pod traffic egressed with the node's internal IP address (10.0.171.187) instead of the assigned EgressIP (10.0.160.5). Despite correct EgressIP configuration, OVN-Kubernetes failed to SNAT traffic through the designated EgressIP node.
Root Cause Analysis (RCA)
Technical Breakdown:
What Happened:
EgressIP 10.0.160.5 was correctly assigned to node ip-10-0-165-72
Test prober pod was scheduled on a different node: ip-10-0-171-187
Expected: Pod traffic should be SNATed to EgressIP 10.0.160.5
Actual: Traffic egressed with source IP 10.0.171.187 (the node's own IP)
Result: Packet sniffer found map[10.0.171.187:10] instead of expected map[10.0.160.5:...]
How EgressIP Should Work (Expected Flow)
┌─────────────────────────────────────────────────────────────────┐
│ Step-by-Step Expected Traffic Flow: │
└─────────────────────────────────────────────────────────────────┘
1. Prober Pod (10.128.2.133) on Node ip-10-0-171-187
│
│ HTTP GET to external target (54.68.26.160)
│
▼
2. OVN Logical Router on ip-10-0-171-187
│
│ EgressIP policy match: Redirect to ip-10-0-165-72
│
▼
3. Geneve/STT Tunnel to Node ip-10-0-165-72
│
│ Traffic tunneled to EgressIP-designated node
│
▼
4. OVN SNAT on Node ip-10-0-165-72
│
│ Source NAT: 10.128.2.133 → 10.0.160.5 (EgressIP)
│
▼
5. External Network via br-ex
│
│ Packet egresses with source IP: 10.0.160.5 :white_check_mark:
│
▼
6. Packet Sniffer on ip-10-0-165-72 Captures Traffic
│
└─ Expected: "10.0.160.5" in tcpdump logs
What Actually Happened (Failure)
┌─────────────────────────────────────────────────────────────────┐
│ ACTUAL (BROKEN) Traffic Flow: │
└─────────────────────────────────────────────────────────────────┘
1. Prober Pod (10.128.2.133) on Node ip-10-0-171-187
│
│ HTTP GET to external target (54.68.26.160)
│
▼
2. OVN Logical Router on ip-10-0-171-187
│
│ :x: EgressIP policy FAILED - No redirect to ip-10-0-165-72
│
▼
3. Default Route on Node ip-10-0-171-187
│
│ Traffic uses normal egress path (not EgressIP)
│
▼
4. SNAT to Node's Own IP
│
│ Source NAT: 10.128.2.133 → 10.0.171.187 (NODE IP, NOT EgressIP!)
│
▼
5. External Network via br-ex
│
│ Packet egresses with source IP: 10.0.171.187 :x: WRONG!
│
▼
6. Packet Sniffer on ip-10-0-171-187 (wrong node!) Captures Traffic
│
└─ Actual: "10.0.171.187" in tcpdump logs (FAILURE)
Evidence from the Logs
Expected EgressIP Assignment:
Line 1545: map[ip-10-0-165-72:[10.0.160.5] ip-10-0-171-187:[10.0.160.6]]
Line 1556: Egress IP object does have all IPs for map[10.0.160.5:ip-10-0-165-72]
:white_check_mark: EgressIP 10.0.160.5 correctly assigned to node ip-10-0-165-72
Prober Pod Placement:
Line 2022: prober-podplq5v: Successfully assigned to ip-10-0-171-187
Line 2033: prober-podplq5v: eth0 [10.128.2.133/23]
:white_check_mark: Pod scheduled on node ip-10-0-171-187 (different node - correct for test)
Traffic Verification (FAILURE):
Line 1710: Found map is: map[10.0.171.187:10]
:x: Packet sniffer found traffic from 10.0.171.187 (the pod's node IP)
:x: Expected to find traffic from 10.0.160.5 (the EgressIP)
Test Timeout:
Line 1557: Making sure that 10 requests with EgressIPs map[10.0.160.5:ip-10-0-165-72] were seen
Line 2055: Timed out after 120.533s. Expected <bool>: false to be true
:x: Test waited 120 seconds but never saw traffic from the EgressIP address
Impacted tests
[sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] pods should have the assigned EgressIPs and EgressIPs can be updated [Serial] [Suite:openshift/conformance/serial] [sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] pods should keep the assigned EgressIPs when being rescheduled to another node [Serial] [Suite:openshift/conformance/serial] [sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] pods should have the assigned EgressIPs and EgressIPs can be deleted and recreated [Skipped:azure][apigroup:route.openshift.io] [Serial] [Suite:openshift/conformance/serial] [sig-network][Feature:EgressIP][apigroup:operator.openshift.io] [external-targets][apigroup:user.openshift.io][apigroup:security.openshift.io] only pods matched by the pod selector should have the EgressIPs [Serial] [Suite:openshift/conformance/serial]
Version-Release number of selected component (if applicable):
How reproducible:
Permafailing aws serial jobs in payloads for 4.20,4.21 and 4.22
Examples
4.21-e2e-aws-ovn-serial-1of2/2010377540270559232
4.21-e2e-aws-ovn-techpreview-serial-1of3/2010389507437760512
4.21-e2e-aws-ovn-techpreview-serial-2of3/2010375508918800384
4.21-e2e-aws-ovn-techpreview-serial-3of3/2010375240890191872
- is related to
-
TRT-2497 Egress tests failure caused a few nightly jobs to fail in 4.22/4.21/4.20
-
- New
-
- links to