-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
?
-
?
-
?
-
?
-
None
-
-
-
Important
This affects BGP setups with ovn-routing configured only.
It affects connectivity that uses geneve tunnels between compute nodes. Hence:
- no packet loss to connect from a external machine to VM FIPs (or VMs connected directly to an external network).
- no packet loss between VMs running on the same compute.
- packet loss affects VMs running on different computes, connected to the same tenant network.
- packet loss affects VMs running on different computes, connected to different tenant network (networks connected through a neutron router).
It seems this issue is related to a problem connecting geneve tunnels when BGP is used with ovn-routing. The compute's ovn-encap-ip addresses are configured on the loopback interface and exposed via BGP. The computes are connected to their peer leafs through their eth2 and eth3 (connected to leaf-0 and leaf-1 respectively), but the corresponding IPs are added to the compute's br-ex and br-ex-2 interfaces respectively.
[root@compute-krwxsiik-0 ~]# ovs-vsctl get open . external_ids:ovn-encap-ip "172.30.0.2"
[root@compute-krwxsiik-0 ~]# podman exec -ituroot frr vtysh -c 'show run' ... router bgp 64999 bgp router-id 172.30.0.2 ... neighbor 100.64.0.1 peer-group uplink neighbor 100.65.0.1 peer-group uplink ...
[root@compute-krwxsiik-0 ~]# ip a s br-ex 8: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether a6:95:b9:16:03:44 brd ff:ff:ff:ff:ff:ff inet 100.64.0.2/30 brd 100.64.0.3 scope global br-ex valid_lft forever preferred_lft forever inet6 fe80::a495:b9ff:fe16:344/64 scope link valid_lft forever preferred_lft forever [root@compute-krwxsiik-0 ~]# ip a s br-ex-2 7: br-ex-2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default qlen 1000 link/ether 4a:2e:1b:bc:8f:46 brd ff:ff:ff:ff:ff:ff inet 100.65.0.2/30 brd 100.65.0.3 scope global br-ex-2 valid_lft forever preferred_lft forever inet6 fe80::482e:1bff:febc:8f46/64 scope link valid_lft forever preferred_lft forever
When a packet with destination ovn-encap-ip is received by a compute, most of the times it is received by the proper process and successfully answered. But sometimes, the compute forwards the packet to a peer leaf, which forwards again the packet to the compute, entering a loop that sometimes ends when the packet TTL reaches 0.
Example of packet properly received and processed by the compute:
1. request received via eth2 (comes from compute-1 via leaf-0):
10:50:02.914630 52:54:00:1c:90:02 > a6:95:b9:16:03:44, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 61, id 23167, offset 0, flags [DF], proto ICMP (1), length 84) 172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 1, length 64
2. reply sent via eth3, (goes to compute-1 through leaf-1):
10:50:02.914896 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 55443, offset 0, flags [none], proto ICMP (1), length 84) 172.30.0.2 > 172.30.1.2: ICMP echo reply, id 2, seq 1, length 64
Example of packet that enters a loop until TTL is 0:
1. request received via eth2 (comes from compute-1 via leaf-0):
10:50:03.915370 52:54:00:1c:90:02 > a6:95:b9:16:03:44, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 61, id 23642, offset 0, flags [DF], proto ICMP (1), length 84) 172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64
2. the same request is wrongly forwarded to the leaf-1, after decreasing TTL:
10:50:03.915851 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 60, id 23642, offset 0, flags [DF], proto ICMP (1), length 84) 172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64
3. leaf-1 receives the request with TTL=60 and sends it back to the same compute-0 with TTL=59 (notice the source MAC becomes destination MAC and viceversa)
05:50:03.916928 4a:2e:1b:bc:8f:46 > 52:54:00:e8:1e:71, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 60, id 23642, offset 0, flags [DF], proto ICMP (1), length 84) 172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64 05:50:03.916952 52:54:00:e8:1e:71 > 4a:2e:1b:bc:8f:46, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 59, id 23642, offset 0, flags [DF], proto ICMP (1), length 84) 172.30.1.2 > 172.30.0.2: ICMP echo request, id 2, seq 2, length 64
And the loop continues until TTL=0 and the packet is dropped.
- blocks
-
OSPRH-884 BZ#2127951 BGP Fast Data Path support
- Testing