Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Networking / ovn-kubernetes
Labels:

Activity Type:
Incidents & Support
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Critical
Regression:
None

Target Backport Versions:

4.18.z, 4.19.z
Target Version:

4.19.z
Release Blocker:
None
Sprint:
CORENET Sprint 272, CORENET Sprint 273, CORENET Sprint 274
sprint_count:
3

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Before this update, intermittent egress internet protocol (IP) handling due to inconsistent state updates in `OVNkubernetes`caused packet drops. These packet drops affected network traffic flow. With this release, `OVNkubernetes`pods consistently use their assigned egress IPs, As a result, dropped packages are reduced and network traffic flow is improved. (link:https://issues.redhat.com/browse/OCPBUGS-59234[~~OCPBUGS-59234~~])

Show
* Before this update, intermittent egress internet protocol (IP) handling due to inconsistent state updates in `OVNkubernetes`caused packet drops. These packet drops affected network traffic flow. With this release, `OVNkubernetes`pods consistently use their assigned egress IPs, As a result, dropped packages are reduced and network traffic flow is improved. (link: https://issues.redhat.com/browse/OCPBUGS-59234 [ OCPBUGS-59234 ])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Observed the following behavior in the logs for a given ovnkube-node host running OVNkubernetes:

 running Pods seem to not use their assigned Egress IP for very short time intervals.
Example:
The firewall logs show the following dropped packages:
2025-06-11 09:06:05    139.23.166.13  139.23.81.244    1521  drop
2025-06-11 09:06:05    139.23.166.13  139.23.81.244    1521  drop

At the same time the following log entries are produced by the ovnkube-controller container:[...]

Node (demchdc253x): I0611 09:06:05.802359 2103837 egressip.go:869] Adding pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: di-pp/bisrte-bisout-745898fc95-mncvv/[10.195.163.47/24]
Node (demchdc255x): I0611 09:06:05.800283 4127154 egressip.go:869] Adding pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: di-pp/bisrte-bisout-745898fc95-mncvv/[10.195.163.47/32]
Node (demchdc253x): I0611 09:06:05.800775 2103837 egressip.go:1042] Deleting pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: bisrte-bisout-745898fc95-mncvv/di-pp
Node (demchdc255x): I0611 09:06:05.799092 4127154 egressip.go:1042] Deleting pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: bisrte-bisout-745898fc95-mncvv/di-pp
[...]

After this interlude, traffic continues to use the correct Egress IP.I found many instances where this only happens once for a Pod but also some instances where this happens multiple times during the runtime of a Pod.

behavior is observed on all ovnkube-node pods across multiple egressIPS.
No external sync issue with argocd (manages the egress objects) observed argo logs and access logs show no actions taken on the objects, seems very intermittent and unpredictable.

Version-Release number of selected component (if applicable):

How reproducible:

Working on internal replicator now - unclear how easy to replicate

Steps to Reproduce:

1. deploy cluster on 4.1.15 - apply multple egressIPs across multiple namespaces

2. observe pods cycling the egressiP state in the logs periodically/dropping packets as a result

3. observe egressIPs are NOT moved to new nodes and are consistently available.

Actual results:

egressIP handling is intermittently unavailable

Expected results:

consistency expected

Additional info:

this cluster was recently viewed/fixed using steps outlined in: https://access.redhat.com/solutions/7125049 and https://issues.redhat.com/browse/OCPBUGS-57179. OVNkube DB's have been rebuilt - and we are seeing this after this process so these are clean "new" egressIP's that the cluster is flapping.
Customer is using ipsec for these nodes - possibly a factor?
data available in first comment below.
proactively tagging mkennell@redhat.com about this since it's the same customer/cluster but filing in a separate bug to ensure it's not conflated with the previous issue.

clones

OCPBUGS-57433 [4.20] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)

Closed

depends on

OCPBUGS-57433 [4.20] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)

Closed

is cloned by

OCPBUGS-59371 [4.18] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)

Closed

is depended on by

OCPBUGS-59371 [4.18] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)

Closed

links to

RHBA-2025:11363 OpenShift Container Platform 4.19.5 bug fix update

Assignee:: Martin Kennelly

Reporter:: Will Russell

Need Info From:: None

Contributors:: Martin Kennelly

QA Contact:: Huiran Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/07/11 8:57 AM

Updated:: 2025/09/13 11:08 PM

Resolved:: 2025/07/22 3:13 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide