-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.18.z
Description of problem:
- Observed the following behavior in the logs for a given ovnkube-node host running OVNkubernetes:
running Pods seem to not use their assigned Egress IP for very short time intervals. Example: The firewall logs show the following dropped packages: 2025-06-11 09:06:05 139.23.166.13 139.23.81.244 1521 drop 2025-06-11 09:06:05 139.23.166.13 139.23.81.244 1521 drop At the same time the following log entries are produced by the ovnkube-controller container:[...] Node (demchdc253x): I0611 09:06:05.802359 2103837 egressip.go:869] Adding pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: di-pp/bisrte-bisout-745898fc95-mncvv/[10.195.163.47/24] Node (demchdc255x): I0611 09:06:05.800283 4127154 egressip.go:869] Adding pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: di-pp/bisrte-bisout-745898fc95-mncvv/[10.195.163.47/32] Node (demchdc253x): I0611 09:06:05.800775 2103837 egressip.go:1042] Deleting pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: bisrte-bisout-745898fc95-mncvv/di-pp Node (demchdc255x): I0611 09:06:05.799092 4127154 egressip.go:1042] Deleting pod egress IP status: {demchdc255x 139.23.166.51} for EgressIP: egress-di-pp and pod: bisrte-bisout-745898fc95-mncvv/di-pp [...] After this interlude, traffic continues to use the correct Egress IP.I found many instances where this only happens once for a Pod but also some instances where this happens multiple times during the runtime of a Pod.
- behavior is observed on all ovnkube-node pods across multiple egressIPS.
- No external sync issue with argocd (manages the egress objects) observed argo logs and access logs show no actions taken on the objects, seems very intermittent and unpredictable.
Version-Release number of selected component (if applicable):
How reproducible:
- Working on internal replicator now - unclear how easy to replicate
Steps to Reproduce:
1. deploy cluster on 4.1.15 - apply multple egressIPs across multiple namespaces
2. observe pods cycling the egressiP state in the logs periodically/dropping packets as a result
3. observe egressIPs are NOT moved to new nodes and are consistently available.
Actual results:
- egressIP handling is intermittently unavailable
Expected results:
- consistency expected
Additional info:
- this cluster was recently viewed/fixed using steps outlined in: https://access.redhat.com/solutions/7125049 and https://issues.redhat.com/browse/OCPBUGS-57179. OVNkube DB's have been rebuilt - and we are seeing this after this process so these are clean "new" egressIP's that the cluster is flapping.
- Customer is using ipsec for these nodes - possibly a factor?
- data available in first comment below.
- proactively tagging mkennell@redhat.com about this since it's the same customer/cluster but filing in a separate bug to ensure it's not conflated with the previous issue.
- clones
-
OCPBUGS-57433 [4.20] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)
-
- Verified
-
- depends on
-
OCPBUGS-57433 [4.20] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)
-
- Verified
-
- is cloned by
-
OCPBUGS-59371 [4.18] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)
-
- Closed
-
- is depended on by
-
OCPBUGS-59371 [4.18] OCP 4.18.15 - EgressIP appears to remove and re-allocate endpoints periodically leading to packet loss (no egressIP migration between hosts observed)
-
- Closed
-
- links to
-
RHBA-2025:11363 OpenShift Container Platform 4.19.5 bug fix update