-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
False
-
-
False
-
-
-
Important
We have detected this issue running some tests on RHOSP 17.1 with DVR enabled (enable_distributed_floating_ip=True) and ovn-provider octavia configured.
OSP compose: RHOS-17.1-RHEL-9-20231122.n.1
OVN version: ovn22.12-22.12.1-11.el9fdp.x86_64
Some VMs are created on different tenant networks. Their IPv4 tenant subnets are connected to a router that is connected to a provider network. FIPs are attached to those VMs.
The tests send traffic to those VMs and we can check the traffic is correctly distributed. We tested both ssh and icmp. Here you can see the echo request captured in the compute node (the VM hypervisor) is received in the interface enp3s0 with destIP=FIP, then goes to the tap interface with destIP=vmIP and the echo reply goes the opposite path:
12:33:26.269766 enp3s0 P ifindex 4 52:54:00:e2:fb:ec ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 55908, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.68 > 10.0.0.244: ICMP echo request, id 5, seq 1, length 64 12:33:26.270744 tap224b1d16-c5 Out ifindex 144 fa:16:3e:10:ea:6f ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 55908, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.68 > 10.100.0.35: ICMP echo request, id 5, seq 1, length 64 12:33:26.271174 tap224b1d16-c5 P ifindex 144 fa:16:3e:9e:d2:9e ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 29683, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.35 > 10.0.0.68: ICMP echo reply, id 5, seq 1, length 64 12:33:26.272019 enp3s0 Out ifindex 4 fa:16:3e:73:6a:2a ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 29683, offset 0, flags [none], proto ICMP (1), length 84) 10.0.0.244 > 10.0.0.68: ICMP echo reply, id 5, seq 1, length 64
Then, we create on that environment an ovn-provider LB with a member connected to the same tenant network,
We repeat the tests, but the ICMP replies are now wrongly centralized. These are the patches captured on the compute node and we can see that the echo reply wrongly goes to the geneve tunnel:
12:54:17.379556 enp3s0 P ifindex 4 52:54:00:e2:fb:ec ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 16991, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.68 > 10.0.0.244: ICMP echo request, id 9, seq 1, length 64 12:54:17.380457 tap224b1d16-c5 Out ifindex 144 fa:16:3e:10:ea:6f ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 16991, offset 0, flags [DF], proto ICMP (1), length 84) 10.0.0.68 > 10.100.0.35: ICMP echo request, id 9, seq 1, length 64 12:54:17.380821 tap224b1d16-c5 P ifindex 144 fa:16:3e:9e:d2:9e ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 64, id 60208, offset 0, flags [none], proto ICMP (1), length 84) 10.100.0.35 > 10.0.0.68: ICMP echo reply, id 9, seq 1, length 64 12:54:17.381256 genev_sys_6081 Out ifindex 13 fa:16:3e:fe:df:52 ethertype IPv4 (0x0800), length 104: (tos 0x0, ttl 63, id 60208, offset 0, flags [none], proto ICMP (1), length 84) 10.0.0.244 > 10.0.0.68: ICMP echo reply, id 9, seq 1, length 64
This doesn't affect SSH, which is correctly distributed.
Pings sent from the VM (i.e., the opposite direction) are successfully distributed too. So, the only issue has been reproduced with pings received by the VM.
This issue is 100% reproducible.
We are trying to determine whether this is a regression, but we are not sure yet.