-
Feature Request
-
Resolution: Done-Errata
-
Normal
-
None
Due to the issues found with dp_hash:
https://bugzilla.redhat.com/show_bug.cgi?id=2175716 - tl;dr kernel chooses a random value for a socket for locally generated traffic, rather than fully hashing on the tuple
OVN switched to using L4_SYM dp hash that is OVS based:
https://github.com/ovn-org/ovn/commit/596ea7acbe687fdf780389e664ffef98f3806b53
OVN-Kubernetes uses the hash mechanism with OVN to do ECMP routing, a feature known as Multiple External Gateway (MEG). In the egress case, a pod will send traffic with potentially 1 or more gateways. The traffic will be hashed for the TCP session and sent to one of the next hops.
However, if the number of gateways goes up or down, it will change the OpenFlow group buckets. In this case, it may be possible that traffic of a current TCP session for a pod is ECMP hashed to a different next hop. From a routing perspective this is OK, but when we have a stateful next hop this becomes a problem.
What we need to identify with this bug:
- Does this problem exist with L4_SYM hash? Yes according to investigation done by Ilya.
- Did this problem exist with dp_hash?
- Can we come up with a config flag and feature to solve this problem?
Potential Solution to the problem:
With Ingress traffic, OVN will store the sender's MAC address in conntrack. Then when ingress reply traffic comes from the pod, OVN will skip doing the ECMP lookup if the conntrack entry already had a stored MAC address. We can leverage this same design for egress. The ECMP flows can only hash the first packet when the CT state is new, and then store the next hop mac address in the conntrack entry. It can continue to use this, unless the next hop goes down and BFD is enabled, then all entries should be flushed with that MAC.
- relates to
-
RFE-5549 Persist healthy connections while SPK pods are being scaled up/down
- Accepted
- links to
-
RHBA-2024:138789 ovn24.03 bug fix and enhancement update
- mentioned on