Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38704

[4.16] EgressIP doesn't work in VRF Mode

XMLWordPrintable

    • +
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, the `egressip` controller failed to correctly manage the assignment of `EgressIP` addresses for network interfaces associated with Virtual Routing and Forwarding (VRF) tables. As a result, when a VRF instance was configured for a network interface, packets were not routed correctly because OVN-K used the main routing table instead of the VRF’s routing table. With this update, the `egressip` controller uses the VRF’s routing table when a VRF instance is configured on a network interface, ensuring accurate `EgressIP` assignment and correct traffic routing. (link:https://issues.redhat.com/browse/OCPBUGS-38704[*OCPBUGS-38704*])


      Show
      Previously, the `egressip` controller failed to correctly manage the assignment of `EgressIP` addresses for network interfaces associated with Virtual Routing and Forwarding (VRF) tables. As a result, when a VRF instance was configured for a network interface, packets were not routed correctly because OVN-K used the main routing table instead of the VRF’s routing table. With this update, the `egressip` controller uses the VRF’s routing table when a VRF instance is configured on a network interface, ensuring accurate `EgressIP` assignment and correct traffic routing. (link: https://issues.redhat.com/browse/OCPBUGS-38704 [* OCPBUGS-38704 *])
    • Bug Fix
    • Done

       Egress IP doesn’t work in multihomed VRF Setup, packets can not be delivered to next-hop for routing.

       

      Topology description 

       

      SNO with following configuration:

      Interface 1 - Machine network

      Interface 2 - VRF with IP and Default Network. 

      Interface 3 - Interface in Main Routing table with static route

       

      Configuration: 

       

       

      ---
      apiVersion: nmstate.io/v1
      kind: NodeNetworkConfigurationPolicy
      metadata:
       name: vrf-1082-with-ip-iface-left-transport
       annotations:
         description: Create VLAN, IP Interface and VRF on Transport node LEFT
      spec:
       nodeSelector:
         transport/node: "left"
       desiredState:
         interfaces:
           - ipv4:
               address:
                 - ip: 10.10.82.2
                   prefix-length: 24
               enabled: true
             name: enp5s0f0.1082
             state: up
             type: vlan
             vlan:
               base-iface: enp5s0f0
               id: 1082
           - name: vrf1082
             state: up
             type: vrf
             vrf:
               port:
                 - enp5s0f0.1082
               route-table-id: 1082
         route-rules:
           config:
             - ip-to: 172.30.0.0/16
               priority: 998
               route-table: 254
             - ip-to: 10.128.0.0/14
               priority: 998
               route-table: 254
             - ip-to: 169.254.169.0/29
               priority: 998
               route-table: 254
         routes:
           config:
             - destination: 0.0.0.0/0
               metric: 150
               next-hop-address: 10.10.82.1
               next-hop-interface: enp5s0f0.1082
               table-id: 1082
      

       

       

      This above creates IP interface on the node

       

       

      ### List of VRFs
       
      [core@pool2-controller1 ~]$ ip l show vrf1082
      6613: vrf1082: <NOARP,MASTER,UP,LOWER_UP> mtu 65575 qdisc noqueue state UP mode DEFAULT group default qlen 1000
          link/ether 72:75:e4:f8:b4:7b brd ff:ff:ff:ff:ff:ff
      
      
      [core@pool2-controller1 ~]$ ip vrf list
      Name              Table
      -----------------------
      vrf1082           1082
      
      
      
      
      
      
      
      ### Default routing table 
      ​​
      [core@pool2-controller1 ~]$ ip r
      default via 10.1.196.254 dev br-ex proto static metric 48
      10.1.196.0/24 dev br-ex proto kernel scope link src 10.1.196.21 metric 48
      10.128.0.0/14 via 10.131.0.1 dev ovn-k8s-mp0
      10.131.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.131.0.2
      169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2
      169.254.169.1 dev br-ex src 10.1.196.21
      169.254.169.3 via 10.131.0.1 dev ovn-k8s-mp0
      172.30.0.0/16 via 169.254.169.4 dev br-ex src 169.254.169.2 mtu 1400
       
      ### VRF Routing table 1082
       
      [core@pool2-controller1 ~]$ ip r show table 1082
      default via 10.10.82.1 dev enp5s0f0.1082 proto static metric 150
      10.10.82.0/24 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2 metric 400
      local 10.10.82.2 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.2
      local 10.10.82.110 dev enp5s0f0.1082 proto kernel scope host src 10.10.82.110
      broadcast 10.10.82.255 dev enp5s0f0.1082 proto kernel scope link src 10.10.82.2
      

       

       

      Deploy Application 

       

       

      ---
      # Create Namespace
      apiVersion: v1
      kind: Namespace
      metadata:
       name: egressip-test
       labels:
         egress: vrf1082
      ---
      # Create EgressIP for the namespace
      apiVersion: k8s.ovn.org/v1
      kind: EgressIP
      metadata:
       name: egressip-vrf-1082
      spec:
       egressIPs:
       - 100.10.82.110
       namespaceSelector:
         matchLabels:
           egress: vrf1082
      --- 
      #Deploy APP
      apiVersion: apps/v1
      kind: Deployment
      metadata:
       name: server
       namespace: egressip-test
      spec:
       selector:
         matchLabels:
           app: server
       template:
         metadata:
           labels:
             app: server
         spec:
           containers:
             - name: server
               image: quay.io/mancubus77/podman-banner
               ports:
                 - name: http
                   containerPort: 8080
               volumeMounts:
                 - name: npm-empty-dir
                   mountPath: /.npm
           volumes:
             - name: npm-empty-dir
               emptyDir: {}
      

       

      OCP Behaviour

       

      With configuration above, OVNK behaves as it supposed to:

       

      # IPTables egress created
       
      [core@pool2-controller1 ~]$ sudo  iptables -nvL OVN-KUBE-EGRESS-IP-MULTI-NIC -t nat
      Chain OVN-KUBE-EGRESS-IP-MULTI-NIC (1 references)
       pkts bytes target     prot opt in     out     source               destination
          0     0 SNAT       0    --  *      enp5s0f0.1082  10.130.0.45          0.0.0.0/0            to:10.10.82.110
          0     0 SNAT       0    --  *      enp5s0f0.1082  10.131.0.15          0.0.0.0/0            to:10.10.82.110
      [
      
      
      
      # IP Rule created 
      [core@pool2-controller1 ~]$ ip rule | grep 6000
      6000: from 10.131.0.15 lookup 7614
      6000: from 10.130.0.45 lookup 7614 
      

      Expected behavior

      • Packets forward to marked Egress Nodes
      • All egress packets from pods in the namespace have configured SRC IP 
      • Packets forward according to VRF Routing table

       

      Actual Behavior  

      • ✅ Packets forward to marked Egress Nodes
      • ✅ All egress packets from pods in the namespace have configured SRC IP 
      • ❌ Packets forward according to VRF Routing table

       

      ## Command from pod
      ~ $ curl 1.1.1.1
      
      
      
      #### ---=== PACKET DUMP ===---
       
      # Packet leaving Pod's OVN Port
      10:48:23.615730 0343c50016330fb P   IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0
       
      # Packet leaving OVN-Domain
      10:48:23.615858 ovn-k8s-mp0 In  IP 10.131.0.15.57974 > 1.1.1.1.80: Flags [S], seq 2114359519, win 32640, options [mss 1360,sackOK,TS val 2792922673 ecr 0,nop,wscale 7], length 0
      
      
      
      # Node tries to resolve Destination IP via ARP (on vlan Interface)
      10:48:23.615903 enp5s0f0.1082 Out ARP, Request who-has 1.1.1.1 tell 10.10.82.2, length 28 
      

      Root cause 

      According to the OVN-K source code, when an EgressIP node is added, the controller searches for a routes associated with a given interface based on it’s ifindex. As VRF has different routing table ID from main(default - 254), the OVN-K controller doesn’t know about any routes associated with the interface and creates the following rule per pod:

       

       

      # IP RULE for 2 pods in the namespace
      [core@pool2-controller1 ~]$ ip rule
      6000: from 10.131.0.15 lookup 7614
      6000: from 10.130.0.45 lookup 7614
       
      # Routing table 7614
      [core@pool2-controller1 ~]$ ip route show table 7614
      default dev enp5s0f0.1082
      

       

       

      The entry above says that all traffic on this interface is directly attached (P2P), therefore Linux routing engine sends ARP in attempt to find a MAC address of the destination (1.1.1.1 in this example) 

       

      Hack

      To make it work, the default route (or associated static route) to VRF needs to be added.

       

      # Add proper route 
      [core@pool2-controller1 ~]$ sudo ip route add default via 10.10.82.1 dev enp5s0f0.1082 table 7614
       
      # Delete default route 
      [core@pool2-controller1 ~]$ sudo ip route del default dev enp5s0f0.1082 table 7614
       
      # Ensure route installed 
      [core@pool2-controller1 ~]$ ip route show table 7614
      default via 10.10.82.1 dev enp5s0f0.1082
      default dev enp5s0f0.1082 metric 10
       
      

      New behaviour

       

      # Packet leaving Pod's OVN Port
      11:01:25.915965 0343c50016330fb P   IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0
       
      # Packet leaving OVN-Domain
      11:01:25.917868 ovn-k8s-mp0 In  IP 10.131.0.15.35796 > 1.1.1.1.80: Flags [S], seq 1447686540, win 32640, options [mss 1360,sackOK,TS val 2793704974 ecr 0,nop,wscale 7], length 0
       
      # Packet addresses to Default GW toward to 10.10.82.1 Router
      11:04:11.136937 enp5s0f0.1082 Out ifindex 6614 b4:96:91:25:93:20 > b4:96:91:1d:7f:f0, ethertype IPv4 (0x0800), length 74: 10.10.82.110.kitim > 1.1.1.1.http: Flags [S], seq 2398495404, win 32640, options [mss 1360,sackOK,TS val 2794031032 ecr 0,nop,wscale 7], length 0
      
      
      
      # Validate MAC 
      [core@pool2-controller1 ~]$ arp -an | grep f0
      ? (10.10.82.1) at b4:96:91:1d:7f:f0 [ether] on enp5s0f0.1082
       
      

       Document with more details: https://docs.google.com/document/d/1ZLIqWjs85_zBZ9J92L63zwbds66gMAnLhShtlPFH9Ro/edit

       

              mkennell@redhat.com Martin Kennelly
              stanatredhat Stan Kozlov
              Jean Chen Jean Chen
              Ronan Hennessy Ronan Hennessy
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: