-
Bug
-
Resolution: Unresolved
-
Major
-
4.16.z, 4.18.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Critical
-
None
-
None
-
None
-
CORENET Sprint 278
-
1
-
In Progress
-
Bug Fix
-
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-60468. The following is the description of the original issue:
—
Description of problem:
EgressIP failover is not working as expected.
In our setup workload are deployed on normal worker node and 2 gateway nodes are tainted, no workloads are expected here, labeled w/ k8s.ovn.org/egress-assignable.
for a EgressIP object created with only 1 IPV4 address or only 1 IPV6 address, EIP address is assigned to one of the gateway node, after reboot on this node, EIP address moved to the second gateway as expected but communication between pod and external system start failing. Same behavior for IPV4 or IPV6.
Version-Release number of selected component (if applicable):
OpenShift 4.18.10 - BareMetal, OVN
How reproducible:
it is systematic
Steps to Reproduce:
1. Deploy OCP in dual stack mode, with two worker node roles: appworker and gateway.
Workload/Pod is deployed on appworker (regular worker nodes, no taints)
gateway nodes are tainted, no workloads are expected here, their purpose is to handle non-multus based ingress (MetalLB) and non-multus egress using EgressIP.
the two gateway nodes are labeled w/ k8s.ovn.org/egress-assignable.
vlan interface configured on secondary interface using nmstate with IPV4/IPV6 for egressIP purpose and default route
routes: config: - destination: 0.0.0.0/0 metric: 999 next-hop-address: 192.168.118.1 next-hop-interface: vlan94 table-id: 254 - destination: ::/0 metric: 999 next-hop-address: 2600:52:7:94::1 next-hop-interface: vlan94 table-id: 254
2.Create 1 EgressIP for IPV6 or IPV4 with only 1 IP address with namespaceSelector and podSelector
deploy a pod on a appworker node and inside the pod try to reach an external system like http server (http server ipv6 address ) outside on OCP using curl for example
# oc get egressips.k8s.ovn.org -A -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: annotations: k8s.ovn.org/egressip-mark: "50025" creationTimestamp: "2025-08-13T13:04:28Z" generation: 2 name: egressip-ipv6-vlan94 resourceVersion: "7178450" uid: 26e932b2-b070-4747-a01f-3b6ee6043522 spec: egressIPs: - 2600:52:7:94::30 namespaceSelector: matchLabels: env: qa podSelector: matchLabels: egressip: ds status: items: - egressIP: 2600:52:7:94::30 node: gateway-0.cwl.integration.core.bos2.lab kind: List metadata: resourceVersion: ""
Pod manifest
apiVersion: v1 kind: Pod metadata: name: fedora-egressip-pod-ds namespace: test labels: egressip: ds egressipv4v6: ipv4v6 spec: containers: - name: fedora-curl image: quay.io/yogananth_subramanian/fedora-tools:latest command: ["/bin/bash", "-c", "sleep infinity"] securityContext: capabilities: add: ["NET_ADMIN"] privileged: true nodeSelector: node-role.kubernetes.io/appworker: ""
curl request inside the pod to the http server outside ocp using ipv6
while true; do date; curl -s -o /dev/null -w "%{http_code}\n" http://[2600:52:7:120::9]:8080; sleep 1; done
3. Reboot Gateway-0 and check if curl still working
Actual results:
pod can not reach external system after gateway-0 reboot
Expected results:
EIG address moved to second gateway node (gateway-1) and communication between pod and external system not failed.
Additional info:
checked few things like routing (Iptables, ip rule ...) and took trace (tcpdump on gateway-1)
created EIP with 1 IP address assigned to gateway-0
[root@nokia-blueprint-jumphost egressIP]# oc get egressips.k8s.ovn.org -A -o yaml apiVersion: v1 items: - apiVersion: k8s.ovn.org/v1 kind: EgressIP metadata: annotations: k8s.ovn.org/egressip-mark: "50025" creationTimestamp: "2025-08-13T13:04:28Z" generation: 2 name: egressip-ipv6-vlan94 resourceVersion: "7178450" uid: 26e932b2-b070-4747-a01f-3b6ee6043522 spec: egressIPs: - 2600:52:7:94::30 namespaceSelector: matchLabels: env: qa podSelector: matchLabels: egressip: ds status: items: - egressIP: 2600:52:7:94::30 node: gateway-0.cwl.integration.core.bos2.lab kind: List metadata: resourceVersion: ""
ip6tavle-save output, snat is added on the node as expected
-A OVN-KUBE-EGRESS-IP-MULTI-NIC -s fd01:200:0:a::b/128 -o vlan94 -j SNAT --to-source 2600:52:7:94::30
[core@gateway-0 ~]$ sudo ip6tables-save
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:05:27 2025
*mangle
:PREROUTING ACCEPT [1083153:2897158177]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1025173:104836097]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:OVN-KUBE-ITP - [0:0]
-A PREROUTING -m mark --mark 0x3f0 -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A PREROUTING -m mark --mark 0x0 -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A OUTPUT -j OVN-KUBE-ITP
COMMIT
# Completed on Wed Aug 13 13:05:27 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:05:27 2025
*raw
:PREROUTING ACCEPT [1083527:2898656274]
:OUTPUT ACCEPT [1025640:104932254]
-A PREROUTING -p udp -m udp --dport 6081 -j NOTRACK
-A OUTPUT -p udp -m udp --dport 6081 -j NOTRACK
COMMIT
# Completed on Wed Aug 13 13:05:27 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:05:27 2025
*filter
:INPUT ACCEPT [1047568:2893599421]
:FORWARD ACCEPT [257:20536]
:OUTPUT ACCEPT [1025236:104860408]
:KUBE-KUBELET-CANARY - [0:0]
-A INPUT -i ovn-k8s-mp0 -m comment --comment "from OVN to localhost" -j ACCEPT
-A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A FORWARD -i ovn-k8s-mp0 -j ACCEPT
-A FORWARD -o ovn-k8s-mp0 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
COMMIT
# Completed on Wed Aug 13 13:05:27 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:05:27 2025
*nat
:PREROUTING ACCEPT [593:37952]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [152736:12141555]
:POSTROUTING ACCEPT [152736:12141555]
:KUBE-KUBELET-CANARY - [0:0]
:OVN-KUBE-EGRESS-IP-MULTI-NIC - [0:0]
:OVN-KUBE-ETP - [0:0]
:OVN-KUBE-EXTERNALIP - [0:0]
:OVN-KUBE-ITP - [0:0]
:OVN-KUBE-NODEPORT - [0:0]
:OVN-KUBE-UDN-MASQUERADE - [0:0]
-A PREROUTING -j OVN-KUBE-ETP
-A PREROUTING -j OVN-KUBE-EXTERNALIP
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-EXTERNALIP
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-ITP
-A POSTROUTING -j OVN-KUBE-EGRESS-IP-MULTI-NIC
-A POSTROUTING -s fd69::1/128 -j MASQUERADE
-A POSTROUTING -s fd01:200:0:b::/64 -j MASQUERADE
-A POSTROUTING -j OVN-KUBE-UDN-MASQUERADE
-A OVN-KUBE-EGRESS-IP-MULTI-NIC -s fd01:200:0:a::b/128 -o vlan94 -j SNAT --to-source 2600:52:7:94::30
-A OVN-KUBE-UDN-MASQUERADE -s fd69::/125 -j RETURN
-A OVN-KUBE-UDN-MASQUERADE -d fd02:300::/112 -j RETURN
-A OVN-KUBE-UDN-MASQUERADE -s fd69::/112 -j MASQUERADE
COMMIT
# Completed on Wed Aug 13 13:05:27 2025
IP rule configured to forward packet using the right secondary interface
[core@gateway-0 ~]$ ip -6 rule
0: from all lookup local
30: from all fwmark 0x1745ec lookup 7
6000: from fd01:200:0:a::b lookup 1016
32766: from all lookup main
[core@gateway-0 ~]$
[core@gateway-0 ~]$ ip -6 route show table 1016
2600:52:7:94::/64 dev vlan94 metric 1024 pref medium
fe80::/64 dev vlan94 metric 1024 pref medium
default via 2600:52:7:94::1 dev vlan94 metric 1024 pref medium
after reboot of gateway-0, EIG moved from gateway-0 to gateway-1 as expected but traffic from pod to external web server stop working even if config applied on gateway-1
[core@gateway-1 ~]$ sudo ip6tables-save
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:37:16 2025
*mangle
:PREROUTING ACCEPT [1122904:3001797840]
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1050722:107828338]
:POSTROUTING ACCEPT [0:0]
:KUBE-IPTABLES-HINT - [0:0]
:KUBE-KUBELET-CANARY - [0:0]
:OVN-KUBE-ITP - [0:0]
-A PREROUTING -m mark --mark 0x3f0 -j CONNMARK --save-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A PREROUTING -m mark --mark 0x0 -j CONNMARK --restore-mark --nfmask 0xffffffff --ctmask 0xffffffff
-A OUTPUT -j OVN-KUBE-ITP
COMMIT
# Completed on Wed Aug 13 13:37:16 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:37:16 2025
*raw
:PREROUTING ACCEPT [1123301:3003178819]
:OUTPUT ACCEPT [1051173:107905132]
-A PREROUTING -p udp -m udp --dport 6081 -j NOTRACK
-A OUTPUT -p udp -m udp --dport 6081 -j NOTRACK
COMMIT
# Completed on Wed Aug 13 13:37:16 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:37:16 2025
*filter
:INPUT ACCEPT [1086805:2998160303]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1050741:107831996]
:KUBE-KUBELET-CANARY - [0:0]
-A INPUT -i ovn-k8s-mp0 -m comment --comment "from OVN to localhost" -j ACCEPT
-A FORWARD -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A FORWARD -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A FORWARD -i ovn-k8s-mp0 -j ACCEPT
-A FORWARD -o ovn-k8s-mp0 -j ACCEPT
-A OUTPUT -p tcp -m tcp --dport 22624 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
-A OUTPUT -p tcp -m tcp --dport 22623 --tcp-flags FIN,SYN,RST,ACK SYN -j REJECT --reject-with icmp6-port-unreachable
COMMIT
# Completed on Wed Aug 13 13:37:16 2025
# Generated by ip6tables-save v1.8.10 (nf_tables) on Wed Aug 13 13:37:16 2025
*nat
:PREROUTING ACCEPT [464:30144]
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [157607:12528889]
:POSTROUTING ACCEPT [157607:12528889]
:KUBE-KUBELET-CANARY - [0:0]
:OVN-KUBE-EGRESS-IP-MULTI-NIC - [0:0]
:OVN-KUBE-ETP - [0:0]
:OVN-KUBE-EXTERNALIP - [0:0]
:OVN-KUBE-ITP - [0:0]
:OVN-KUBE-NODEPORT - [0:0]
:OVN-KUBE-UDN-MASQUERADE - [0:0]
-A PREROUTING -j OVN-KUBE-ETP
-A PREROUTING -j OVN-KUBE-EXTERNALIP
-A PREROUTING -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-EXTERNALIP
-A OUTPUT -j OVN-KUBE-NODEPORT
-A OUTPUT -j OVN-KUBE-ITP
-A POSTROUTING -j OVN-KUBE-EGRESS-IP-MULTI-NIC
-A POSTROUTING -s fd69::1/128 -j MASQUERADE
-A POSTROUTING -s fd01:200:0:7::/64 -j MASQUERADE
-A POSTROUTING -j OVN-KUBE-UDN-MASQUERADE
-A OVN-KUBE-EGRESS-IP-MULTI-NIC -s fd01:200:0:a::b/128 -o vlan94 -j SNAT --to-source 2600:52:7:94::30
-A OVN-KUBE-UDN-MASQUERADE -s fd69::/125 -j RETURN
-A OVN-KUBE-UDN-MASQUERADE -d fd02:300::/112 -j RETURN
-A OVN-KUBE-UDN-MASQUERADE -s fd69::/112 -j MASQUERADE
COMMIT
# Completed on Wed Aug 13 13:37:16 2025
[core@gateway-1 ~]$ ip -6 rule list
0: from all lookup local
30: from all fwmark 0x1745ec lookup 7
6000: from fd01:200:0:a::b lookup 1015
32766: from all lookup main
[core@gateway-1 ~]$
[core@gateway-1 ~]$ ip -6 route show table 1015
2600:52:7:94::/64 dev vlan94 metric 1024 pref medium
fe80::/64 dev vlan94 metric 1024 pref medium
default via 2600:52:7:94::1 dev vlan94 metric 1024 pref medium
[core@gateway-1 ~]$ ip address show vlan94
15: vlan94@tenant-bond1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9026 qdisc noqueue state UP group default qlen 1000
link/ether a0:88:c2:99:fa:d6 brd ff:ff:ff:ff:ff:ff
inet 192.168.118.21/27 brd 192.168.118.31 scope global noprefixroute vlan94
valid_lft forever preferred_lft forever
inet6 2600:52:7:94::30/128 scope global dadfailed tentative
valid_lft forever preferred_lft forever
inet6 2600:52:7:94::21/64 scope global noprefixroute
valid_lft forever preferred_lft forever
inet6 fe80::a288:c2ff:fe99:fad6/64 scope link noprefixroute
valid_lft forever preferred_lft forever
Affected Platforms:
Is it an
- customer issue / SD
- internal RedHat testing failure
If it is an internal RedHat testing failure:
- we have a dual-stack environment to mimic any additional test OCP EgressIP engineering team requires to do.
Partner is also facing the same issue in their environment.
- clones
-
OCPBUGS-60468 EgressIP Failover not working
-
- Verified
-
- is blocked by
-
OCPBUGS-60468 EgressIP Failover not working
-
- Verified
-
- links to