Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-13154

[ovn-routing] packet loss between VMs running on different computes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • ovn-bgp-agent
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None
    • Important

       

      This affects BGP setups with ovn-routing configured only.

      It affects connectivity that uses geneve tunnels between compute nodes. Hence:

      • no packet loss to connect from a external machine to VM FIPs (or VMs connected directly to an external network).
      • no packet loss between VMs running on the same compute.
      • packet loss affects VMs running on different computes, connected to the same tenant network.
      • packet loss affects VMs running on different computes, connected to different tenant network (networks connected through a neutron router).

       

      It seems this issue is related to a problem connecting geneve tunnels when BGP is used with ovn-routing. The compute's ovn-encap-ip addresses are configured on the loopback interface and exposed via BGP. The computes are connected to their peer leafs through their eth2 and eth3 (connected to leaf-0 and leaf-1 respectively), but the corresponding IPs are added to the compute's br-ex and br-ex-2 interfaces respectively.

       

      [root@compute-krwxsiik-0 ~]# ovs-vsctl get open . external_ids:ovn-encap-ip
      "172.30.0.2" 
      

       

       

      [root@compute-krwxsiik-0 ~]# podman exec -ituroot frr vtysh -c 'show run'
      ...
      router bgp 64999
          bgp router-id 172.30.0.2
      ...
       neighbor 100.64.0.1 peer-group uplink
       neighbor 100.65.0.1 peer-group uplink
      ...
      

       

       

      [root@compute-krwxsiik-0 ~]# ip a s br-ex
      8: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          link/ether a6:95:b9:16:03:44 brd ff:ff:ff:ff:ff:ff
          inet 100.64.0.2/30 brd 100.64.0.3 scope global br-ex
             valid_lft forever preferred_lft forever
          inet6 fe80::a495:b9ff:fe16:344/64 scope link 
             valid_lft forever preferred_lft forever
      [root@compute-krwxsiik-0 ~]# ip a s br-ex-2
      7: br-ex-2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000
          link/ether 4a:2e:1b:bc:8f:46 brd ff:ff:ff:ff:ff:ff
          inet 100.65.0.2/30 brd 100.65.0.3 scope global br-ex-2
             valid_lft forever preferred_lft forever
          inet6 fe80::482e:1bff:febc:8f46/64 scope link 
             valid_lft forever preferred_lft forever 
      

       

       

      When a packet with destination ovn-encap-ip is received by a compute, most of the times it is received by the proper process and successfully answered. But sometimes, the compute forwards the packet to a peer leaf, which forwards again the packet to the compute, entering a loop that sometimes ends when the packet TTL reaches 0.

       

      Example of packet properly received and processed by the compute:

      1. request received via eth2 (comes from compute-1 via leaf-0):

       

      10:50:02.914630 52:54:00:1c:90:02 > a6:95:b9:16:03:44, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 61, id 23167, offset 0, flags [DF], proto ICMP (1), length 84)
          172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 1, length 64

      2. reply sent via eth3, (goes to compute-1 through leaf-1):

       

       

      10:50:02.914896 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 55443, offset 0, flags [none], proto ICMP (1), length 84)
          172.30.0.2 > 172.30.1.2: ICMP echo reply, id 2, seq 1, length 64

       

       

      Example of packet that enters a loop until TTL is 0:

      1. request received via eth2 (comes from compute-1 via leaf-0):

      10:50:03.915370 52:54:00:1c:90:02 > a6:95:b9:16:03:44, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 61, id 23642, offset 0, flags [DF], proto ICMP (1), length 84)
          172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64

      2. the same request is wrongly forwarded to the leaf-1, after decreasing TTL:

       

      10:50:03.915851 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 60, id 23642, offset 0, flags [DF], proto ICMP (1), length 84)
          172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64

      3. leaf-1 receives the request with TTL=60 and sends it back to the same compute-0 with TTL=59 (notice the source MAC becomes destination MAC and viceversa)

       

       

      05:50:03.916928 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 60, id 23642, offset 0, flags [DF], proto ICMP (1), length 84)
          172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64
      
      05:50:03.916952 52:54:00:e8:1e:71 > 4a:2e:1b:bc:8f:46, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 23642, offset 0, flags [DF], proto ICMP (1), length 84)
          172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64

       

       

      And the loop continues until TTL=0 and the packet is dropped.

              jlibosva Jakub Libosvar
              eolivare Eduardo Olivares Toledo
              rhos-dfg-networking-squad-bgp
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: