Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74246

EgressIP on Primary Interface is not working as expected even after setting routingViaHost to true after SDN to OVN migration

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • 4.16.z
    • None
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem: UOB is migrating the network plugin from OpenShift SDN to OVN-Kubernetes. The static routing configuration on the node used to be working on OpenShift SDN, while it is not working in OVN-Kubernetes when using Egress IP on the primary interface.

      When the following conditions apply:
      1. Run OVN-Kubernetes as a network plugin
      2. Implement static routing on the node
      3. Attach Egress IP to the namespace
      4. Egress IP is attached to the primary interface (br-ex)

      The egress IP traffic will be sent out of the node, but it doesn't use the static routing implemented on the node.

      After investigation, we realized the traffic is sent out through a gateway router, which uses its own routing table, not the host routing table.

      Based on this document: https://docs.redhat.com/en/documentation/openshift_container_platform/4.19/html/ovn-kubernetes_network_plugin/configuring-gateway

      routingViaHost=true supposes to include the Egress IP traffic, as there is nowhere in the document mentions that Egress IP traffic will be excluded.

      Version-Release number of selected component (if applicable): 4.16.z

      How reproducible: Using OVN-Kubernetes network plugin with the Egress IP on the primary interface, static routing configuration on the node and setting routingViaHost=true.

      Steps to Reproduce:

      1. Implement static routing on the node with machine config, for example, for the external traffic with IP Block 172.18.184.0/22, the traffic has to be sent to the gateway 172.18.182.1, while the rest of the traffic will be sent to the default gateway:

      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
      labels:
        machineconfiguration.openshift.io/role: log
      name: 98-log-static-route
      spec:
      config:
        ignition:
          version: 3.2.0
        systemd:
          units:
          - enabled: true
            name: update-static-route.service
            contents: |
             [Unit]
             Description=Updates static route
             Wants=network-online.target
             After=network-online.target ignition-firstboot-complete.service ovs-configuration.service
             Before=kubelet.service crio.service 
      
             [Service]
             Type=oneshot
             RemainAfterExit=true
             ExecStart=ip route add 172.18.184.0/22 via 172.18.182.1 dev br-ex
      
             [Install]
             WantedBy=kubelet.service
      

      Confirm the static routing is presented on the node

      [root@cocwkusgv113 /]# ip r l
      default via 172.18.182.200 dev br-ex proto static metric 48
      10.165.0.0/16 via 10.165.30.1 dev ovn-k8s-mp0
      10.165.30.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.165.30.2
      10.166.0.0/16 via 169.254.169.4 dev br-ex src 169.254.169.2 mtu 1400
      169.254.169.0/29 dev br-ex proto kernel scope link src 169.254.169.2
      169.254.169.1 dev br-ex src 172.18.183.62
      169.254.169.3 via 10.165.30.1 dev ovn-k8s-mp0
      172.18.182.0/23 dev br-ex proto kernel scope link src 172.18.183.62 metric 48
      172.18.184.0/22 via 172.18.182.1 dev br-ex
      

      2. Create a primary interface Egress IP and attach to the namespace

      3. Test connection from the namespace with egress IP attached and capture the traffic

      Pod IP: 10.165.30.9

      Destination IP: 172.18.184.182

      tcpdump: verbose output suppressed, use -v or -vu for full protocol decode
      listening on any, link-type LINUX_SLL2 (Linux cooked v2), capture size 262144 bytes
      20:38:56.929150 fc65f09869b4c85 P   IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seq 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444654044 ecr 0,nop,wscale 7], length 0
      20:38:56.929544 genev_sys_6081 Out IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seq 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444654044 ecr 0,nop,wscale 7], length 0
      20:38:57.957424 fc65fOS369b4c85 P   IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seq 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444655072 ecr 0,nop,wscale 7], length 0
      20:38:57.957499 genev_sys_6081 Out IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seq 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444655072 ecr 0,nop,wscale 7], length 0
      20:39:00.005411 fc65f09869b4c85 P   IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seq 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444657120 ecr 0,nop,wscale 7], length 0
      20:39:00.005469 genev_sys_6081 Out IP 10.165.30.9.59832 > 172.18.184.182.443: Flags [S], seg 3056481693, win 32640, options [mss 1360,sackOK,TS val 3444657120 ecr 0,nop,wscale 7], Length 0 

      Actual results: Egress IP traffic is not using the static routing when exiting the node.

      Expected results: Egress IP traffic uses the static routing when exiting the node.

      Additional info: We tried adding the routing to the gateway in OVN-Kubernetes directly.

      uncpecp@lxecpusgv102 ~]$ oc rsh -c nbdb ovnkube-node-tz745 bash
      [root@cocwkusgv111 ~]# ovn-nbctl lr-list
      4bf08b2b-8e8e-42f2-a20b-f5d32f478d9f (GR_cocwkusgv111.sg.uobnet.com)
      3e03b353-3cfd-4993-a652-1913408add65 (ovn_cluster_router)
      [root@cocwkusgv111 ~]# ovn-nbctl lr-route-add 4bf08b2b-8e8e-42f2-a20b-f5d32f478d9f 172.18.184.0/22 172.18.182.1
      [root@cocwkusgv111 ~]# ovn-nbctl lr-list
      4bf08b2b-8e8e-42f2-a20b-f5d32f478d9f (GR_cocwkusgv111.sg.uobnet.com)
      3e03b353-3cfd-4993-a652-1913408add65 (ovn_cluster_router)
      [root@cocwkusgv111 ~]# ovn-nbctl lr-route-list
      ovn-nbctl: 'lr-route-list' command requires at least 1 arguments
      [root@cocwkusgv111 ~]# ovn-nbctl lr-route-list 4bf08b2b-8e8e-42f2-a20b-f5d32f478d9f
      IPv4 Routes
      Route Table <main>:
               169.254.169.0/29             169.254.169.4 dst-ip rtoe-GR_cocwkusgv111.sg.uobnet.com
                172.18.184.0/22              172.18.182.1 dst-ip
                  10.165.0.0/16                100.64.0.1 dst-ip
                      0.0.0.0/0            172.18.182.200 dst-ip rtoe-GR_cocwkusgv111.sg.uobnet.com
      

      Then we tried testing the connection again and it works, which indicates that the static routing on the node is not being honoured when using egress IP.

      Affected Platforms: OpenShift Container Platform

      Expected Outcome: We need the engineering to provide an output on how to setup the network on OCP considered we have the following conditions:

      1. Run OVN-Kubernetes as a network plugin
      2. Use Egress IP with the traffic (traffic exiting the cluster must use Egress IP)
      3. Implement static routing on OCP
      4. No network configuration change outside OCP

      Attachments:

      Network command outputs: https://attachments.access.redhat.com/hydra/rest/cases/04329820/attachments/1499a61c-cd51-4678-aceb-1be70a6f69db?usePresignedUrl=true

      Output from tcpdump -i any host 172.18.184.182 -nnn -y LINUX_SLL2 -e: https://attachments.access.redhat.com/hydra/rest/cases/04329820/attachments/c1d8d4af-cc25-45f5-9769-e45e792f48b4?usePresignedUrl=true

              sseethar Surya Seetharaman
              rh-ee-nekudoms Nattapong Ekudomsuk
              None
              None
              Anurag Saxena Anurag Saxena
              None
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: