Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-628

ECMP symmetric reply support for connections initiated from behind the route

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • FDP-24.G
    • None
    • ovn24.03
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • FDP 24.F

      Due to the issues found with dp_hash:

      https://bugzilla.redhat.com/show_bug.cgi?id=2175716 - tl;dr kernel chooses a random value for a socket for locally generated traffic, rather than fully hashing on the tuple

      OVN switched to using L4_SYM dp hash that is OVS based:

      https://github.com/ovn-org/ovn/commit/596ea7acbe687fdf780389e664ffef98f3806b53

       

      OVN-Kubernetes uses the hash mechanism with OVN to do ECMP routing, a feature known as Multiple External Gateway (MEG). In the egress case, a pod will send traffic with potentially 1 or more gateways. The traffic will be hashed for the TCP session and sent to one of the next hops.

       

      However, if the number of gateways goes up or down, it will change the OpenFlow group buckets. In this case, it may be possible that traffic of a current TCP session for a pod is ECMP hashed to a different next hop. From a routing perspective this is OK, but when we have a stateful next hop this becomes a problem.

      What we need to identify with this bug:

      1. Does this problem exist with L4_SYM hash? Yes according to investigation done by Ilya.
      2. Did this problem exist with dp_hash?
      3. Can we come up with a config flag and feature to solve this problem?

      Potential Solution to the problem:

      With Ingress traffic, OVN will store the sender's MAC address in conntrack. Then when ingress reply traffic comes from the pod, OVN will skip doing the ECMP lookup if the conntrack entry already had a stored MAC address. We can leverage this same design for egress. The ECMP flows can only hash the first packet when the CT state is new, and then store the next hop mac address in the conntrack entry. It can continue to use this, unless the next hop goes down and BFD is enabled, then all entries should be flushed with that MAC.

       

            nusiddiq@redhat.com Siddique Numan
            trozet@redhat.com Tim Rozet
            Ehsan Elahi Ehsan Elahi
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: