Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-182

echo replies from VMs wrongly centralized creating an OVN-provider LB

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • ovn22.12
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Important

      We have detected this issue running some tests on RHOSP 17.1 with DVR enabled (enable_distributed_floating_ip=True) and ovn-provider octavia configured.
      OSP compose: RHOS-17.1-RHEL-9-20231122.n.1
      OVN version: ovn22.12-22.12.1-11.el9fdp.x86_64

      Some VMs are created on different tenant networks. Their IPv4 tenant subnets are connected to a router that is connected to a provider network. FIPs are attached to those VMs.

      The tests send traffic to those VMs and we can check the traffic is correctly distributed. We tested both ssh and icmp. Here you can see the echo request captured in the compute node (the VM hypervisor) is received in the interface enp3s0 with destIP=FIP, then goes to the tap interface with destIP=vmIP and the echo reply goes the opposite path:

      12:33:26.269766 enp3s0 P   ifindex 4 52:54:00:e2:fb:ec ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 55908, offset 0, flags [DF], proto ICMP (1), length 84)                                                         
          10.0.0.68 > 10.0.0.244: ICMP echo request, id 5, seq 1, length 64             
      12:33:26.270744 tap224b1d16-c5 Out ifindex 144 fa:16:3e:10:ea:6f ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 55908, offset 0, flags [DF], proto ICMP (1), length 84)                                                   
          10.0.0.68 > 10.100.0.35: ICMP echo request, id 5, seq 1, length 64            
      12:33:26.271174 tap224b1d16-c5 P   ifindex 144 fa:16:3e:9e:d2:9e ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 29683, offset 0, flags [none], proto ICMP (1), length 84)                                           
          10.100.0.35 > 10.0.0.68: ICMP echo reply, id 5, seq 1, length 64               
      12:33:26.272019 enp3s0 Out ifindex 4 fa:16:3e:73:6a:2a ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 29683, offset 0, flags [none], proto ICMP (1), length 84)                                                   
          10.0.0.244 > 10.0.0.68: ICMP echo reply, id 5, seq 1, length 64           

      Then, we create on that environment an ovn-provider LB with a member connected to the same tenant network,

      We repeat the tests, but the ICMP replies are now wrongly centralized. These are the patches captured on the compute node and we can see that the echo reply wrongly goes to the geneve tunnel:

      12:54:17.379556 enp3s0 P   ifindex 4 52:54:00:e2:fb:ec ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 16991, offset 0, flags [DF], proto ICMP (1), length 84)
          10.0.0.68 > 10.0.0.244: ICMP echo request, id 9, seq 1, length 64
      12:54:17.380457 tap224b1d16-c5 Out ifindex 144 fa:16:3e:10:ea:6f ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 16991, offset 0, flags [DF], proto ICMP (1), length 84)
          10.0.0.68 > 10.100.0.35: ICMP echo request, id 9, seq 1, length 64
      12:54:17.380821 tap224b1d16-c5 P   ifindex 144 fa:16:3e:9e:d2:9e ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 60208, offset 0, flags [none], proto ICMP (1), length 84)
          10.100.0.35 > 10.0.0.68: ICMP echo reply, id 9, seq 1, length 64
      12:54:17.381256 genev_sys_6081 Out ifindex 13 fa:16:3e:fe:df:52 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 60208, offset 0, flags [none], proto ICMP (1), length 84)
          10.0.0.244 > 10.0.0.68: ICMP echo reply, id 9, seq 1, length 64

      This doesn't affect SSH, which is correctly distributed.

      Pings sent from the VM (i.e., the opposite direction) are successfully distributed too. So, the only issue has been reproduced with pings received by the VM.

       

      This issue is 100% reproducible.

      We are trying to determine whether this is a regression, but we are not sure yet.

              amusil@redhat.com Ales Musil
              eolivare Eduardo Olivares Toledo
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: