-
Bug
-
Resolution: Done
-
Critical
-
None
-
rhel-9
-
None
-
8
-
False
-
-
False
-
rhel-9
-
None
-
rhel-net-ovs-dpdk
-
-
-
ssg_networking
-
OVS/DPDK - FDP-25.E - 1, FDP-OVS/DPDK Sprint 7
-
2
-
Customer Escalated, Customer Facing, Customer Reported
Previously in FDP-1474 and FDP-1273 a customer identified that a critical learn action was not updating openflow rules as it should during a bandwidth test that included a vxlan tunnel failover. The failover caused a large mount of GARP/ND traffic from the new VXLAN tunnel, but OVS continued to send traffic to the old tunnel.
Upon investigation we found that in this situation OVS treated GARP/ND traffic equally to all other traffic. If any packets happened to be processed from the old tunnel after the failover garp packets were received, the learn rule would just update the out of date return path.
This setup is complex, so I created a reproducer environment to help elucidate the configuration. It is attached to this ticket.
Both in my reproduction environment and on the client systems, the following workaround which causes GARP/ND traffic to create a higher priority flow when learned resolved the issue:
ovs-ofctl add-flow br-tun "table=10,arp,arp_tha=ff:ff:ff:ff:ff:ff,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int" ovs-ofctl add-flow br-tun "table=10,icmp6,icmp_type=134,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int" ovs-ofctl add-flow br-tun "table=10,icmp6,icmp_type=136,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int"
Three possible solutions are:
- Modify OVS's action xlate to bump the priority during GARP/ND. This solution is effective, but also violates the specification of what learn is supposed to do.
- Make learned flow_add / revalidation more sensitive to when packets are received or favor keeping new different flows over older. This will not directly resolve this issue and may just mask it.
- Modify the ML2/OVS interface to maintain GARP/ND rules along with learn actions.
- is triggering
-
OSPRH-18938 Investigate Neutron workaround for FDP-1562
-
- Closed
-