-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15.z, 4.16.z
-
+
-
None
-
False
-
-
-
Bug Fix
-
Done
-
-
-
-
Egress IP doesn’t work in multihomed VRF Setup, packets can not be delivered to next-hop for routing.
Topology description
SNO with following configuration:
Interface 1 - Machine network
Interface 2 - VRF with IP and Default Network.
Interface 3 - Interface in Main Routing table with static route
Configuration:
--- apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: vrf-1082-with-ip-iface-left-transport annotations: description: Create VLAN, IP Interface and VRF on Transport node LEFT spec: nodeSelector: transport/node: "left" desiredState: interfaces: - ipv4: address: - ip: 10.10.82.2 prefix-length: 24 enabled: true name: enp5s0f0.1082 state: up type: vlan vlan: base-iface: enp5s0f0 id: 1082 - name: vrf1082 state: up type: vrf vrf: port: - enp5s0f0.1082 route-table-id: 1082 route-rules: config: - ip-to: 172.30.0.0/16 priority: 998 route-table: 254 - ip-to: 10.128.0.0/14 priority: 998 route-table: 254 - ip-to: 169.254.169.0/29 priority: 998 route-table: 254 routes: config: - destination: 0.0.0.0/0 metric: 150 next-hop-address: 10.10.82.1 next-hop-interface: enp5s0f0.1082 table-id: 1082
This above creates IP interface on the node
### List of VRFs [core@pool2-controller1 ~]$ ip l show vrf1082 6613: vrf1082: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 72:75:e4:f8:b4:7b brd ff:ff:ff:ff:ff:ff [core@pool2-controller1 ~]$ ip vrf list Name Table ----------------------- vrf1082 1082 ### Default routing table [core@pool2-controller1 ~]$ ip r default via 10.1.196.254 dev br-ex proto static metric 48 10.1.196.0/24 dev br-ex proto kernel scope link src 10.1.196.21 metric 48 10.128.0.0/14 via 10.131.0.1 dev ovn-k8s-mp0 10.131.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.131.0.2 169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2 169.254.169.1 dev br-ex src 10.1.196.21 169.254.169.3 via 10.131.0.1 dev ovn-k8s-mp0 172.30.0.0/16 via 169.254.169.4 dev br-ex src 169.254.169.2 mtu 1400 ### VRF Routing table 1082 [core@pool2-controller1 ~]$ ip r show table 1082 default via 10.10.82.1 dev enp5s0f0.1082 proto static metric 150 10.10.82.0/24 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2 metric 400 local 10.10.82.2 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.2 local 10.10.82.110 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.110 broadcast 10.10.82.255 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2
Deploy Application
---
# Create Namespace
apiVersion: v1
kind: Namespace
metadata:
name: egressip-test
labels:
egress: vrf1082
---
# Create EgressIP for the namespace
apiVersion: k8s.ovn.org/v1
kind: EgressIP
metadata:
name: egressip-vrf-1082
spec:
egressIPs:
- 100.10.82.110
namespaceSelector:
matchLabels:
egress: vrf1082
---
#Deploy APP
apiVersion: apps/v1
kind: Deployment
metadata:
name: server
namespace: egressip-test
spec:
selector:
matchLabels:
app: server
template:
metadata:
labels:
app: server
spec:
containers:
- name: server
image: quay.io/mancubus77/podman-banner
ports:
- name: http
containerPort: 8080
volumeMounts:
- name: npm-empty-dir
mountPath: /.npm
volumes:
- name: npm-empty-dir
emptyDir: {}
OCP Behaviour
With configuration above, OVNK behaves as it supposed to:
# IPTables egress created [core@pool2-controller1 ~]$ sudo iptables -nvL OVN-KUBE-EGRESS-IP-MULTI-NIC -t nat Chain OVN-KUBE-EGRESS-IP-MULTI-NIC (1 references) pkts bytes target prot opt in out source destination 0 0 SNAT 0 -- * enp5s0f0.1082 10.130.0.45 0.0.0.0/0 to:10.10.82.110 0 0 SNAT 0 -- * enp5s0f0.1082 10.131.0.15 0.0.0.0/0 to:10.10.82.110 [ # IP Rule created [core@pool2-controller1 ~]$ ip rule | grep 6000 6000: from 10.131.0.15 lookup 7614 6000: from 10.130.0.45 lookup 7614
Expected behavior
- Packets forward to marked Egress Nodes
- All egress packets from pods in the namespace have configured SRC IP
- Packets forward according to VRF Routing table
Actual Behavior
- ✅ Packets forward to marked Egress Nodes
- ✅ All egress packets from pods in the namespace have configured SRC IP
- ❌ Packets forward according to VRF Routing table
## Command from pod ~ $ curl 1.1.1.1 #### ---=== PACKET DUMP ===--- # Packet leaving Pod's OVN Port 10:48:23.615730 0343c50016330fb P IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0 # Packet leaving OVN-Domain 10:48:23.615858 ovn-k8s-mp0 In IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0 # Node tries to resolve Destination IP via ARP (on vlan Interface) 10:48:23.615903 enp5s0f0.1082 Out ARP, Request who-has 1.1.1.1 tell 10.10.82.2, length 28
Root cause
According to the OVN-K source code, when an EgressIP node is added, the controller searches for a routes associated with a given interface based on it’s ifindex. As VRF has different routing table ID from main(default - 254), the OVN-K controller doesn’t know about any routes associated with the interface and creates the following rule per pod:
# IP RULE for 2 pods in the namespace [core@pool2-controller1 ~]$ ip rule 6000: from 10.131.0.15 lookup 7614 6000: from 10.130.0.45 lookup 7614 # Routing table 7614 [core@pool2-controller1 ~]$ ip route show table 7614 default dev enp5s0f0.1082
The entry above says that all traffic on this interface is directly attached (P2P), therefore Linux routing engine sends ARP in attempt to find a MAC address of the destination (1.1.1.1 in this example)
Hack
To make it work, the default route (or associated static route) to VRF needs to be added.
# Add proper route [core@pool2-controller1 ~]$ sudo ip route add default via 10.10.82.1 dev enp5s0f0.1082 table 7614 # Delete default route [core@pool2-controller1 ~]$ sudo ip route del default dev enp5s0f0.1082 table 7614 # Ensure route installed [core@pool2-controller1 ~]$ ip route show table 7614 default via 10.10.82.1 dev enp5s0f0.1082 default dev enp5s0f0.1082 metric 10
New behaviour
# Packet leaving Pod's OVN Port 11:01:25.915965 0343c50016330fb P IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0 # Packet leaving OVN-Domain 11:01:25.917868 ovn-k8s-mp0 In IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0 # Packet addresses to Default GW toward to 10.10.82.1 Router 11:04:11.136937 enp5s0f0.1082 Out ifindex 6614 b4:96:91:25:93:20 > b4:96:91:1d:7f:f0, ethertype IPv4 (0x0800), length 74: 10.10.82.110.kitim > 1.1.1.1.http: Flags [S], seq 2398495404, win 32640, options [mss 1360,sackOK,TS val 2794031032 ecr 0,nop,wscale 7], length 0 # Validate MAC [core@pool2-controller1 ~]$ arp -an | grep f0 ? (10.10.82.1) at b4:96:91:1d:7f:f0 [ether] on enp5s0f0.1082
Document with more details: https://docs.google.com/document/d/1ZLIqWjs85_zBZ9J92L63zwbds66gMAnLhShtlPFH9Ro/edit
- links to
-
RHBA-2024:6004 OpenShift Container Platform 4.16.z bug fix update