Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1562

OVS learn action as configured by ML2 not sensitive enough to garp/nd packets during high throughput

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • rhel-9
    • openvswitch3.3
    • None
    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • rhel-9
    • None
    • rhel-net-ovs-dpdk
    • ssg_networking
    • OVS/DPDK - FDP-25.E - 1, FDP-OVS/DPDK Sprint 7
    • 2
    • Customer Escalated, Customer Facing, Customer Reported

      Previously in FDP-1474 and FDP-1273 a customer identified that a critical learn action was not updating openflow rules as it should during a bandwidth test that included a vxlan tunnel failover. The failover caused a large mount of GARP/ND traffic from the new VXLAN tunnel, but OVS continued to send traffic to the old tunnel.

      Upon investigation we found that in this situation OVS treated GARP/ND traffic equally to all other traffic. If any packets happened to be processed from the old tunnel after the failover garp packets were received, the learn rule would just update the out of date return path.

      This setup is complex, so I created a reproducer environment to help elucidate the configuration. It is attached to this ticket.

      Both in my reproduction environment and on the client systems, the following workaround which causes GARP/ND traffic to create a higher priority flow when learned resolved the issue:

      ovs-ofctl add-flow br-tun "table=10,arp,arp_tha=ff:ff:ff:ff:ff:ff,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int"
      ovs-ofctl add-flow br-tun "table=10,icmp6,icmp_type=134,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int"
      ovs-ofctl add-flow br-tun "table=10,icmp6,icmp_type=136,priority=2 actions=learn(table=20,hard_timeout=300,priority=2,cookie=0x4291b5d8aea40b08,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:OXM_OF_IN_PORT[]),output:patch-int"
      

      Three possible solutions are:

      1. Modify OVS's action xlate to bump the priority during GARP/ND. This solution is effective, but also violates the specification of what learn is supposed to do.
      2. Make learned flow_add / revalidation more sensitive to when packets are received or favor keeping new different flows over older. This will not directly resolve this issue and may just mask it.
      3. Modify the ML2/OVS interface to maintain GARP/ND rules along with learn actions.

        1. worklog11
          13 kB
          Mike Pattrick

              ovsdpdk-triage ovsdpdk triage
              rh-ee-mpattric Mike Pattrick
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: