-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
When packet enters the router, which has load balancers on it, the packet is treated as new and sent straight to the DNAT stage for the LB if it matches. This causes reply traffic to be treated incorrectly, if it was originally SNAT'ed by the router. In the OVNK case we have a gateway router per node, GR_<node name> where it will SNAT packets that egress towards rtoe and has load balancers that match on node IP and balance to a pod so:
br-ex--rtoe---(172.18.0.3)GR_ovn-worker-----.....---pod
Lets assume the GR_ovn-worker has a nodeport service (load balancer with node IP) for 172.18.0.3:32000
The pod sends an egress packet with source port 32000. The packet gets SNAT'ed to 172.18.0.3. A reply comes back, OVN should first send this packet to zone 0, to see if this is a reply packet and if unSNAT is necessary. Instead, it skips this and sends it straight the LB DNAT.
Working with dceara@redhat.com we found the origin of this to be:
https://github.com/ovn-org/ovn/commit/832893bdbb42fd121f0b8de90514e27a20356507
Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=1815217
https://bugzilla.redhat.com/show_bug.cgi?id=1815217#c19
Dumitru tested without the patch and the original problem is no longer present, so it seems safe to revert 832893.
Note the issue described in this bug does not happen when OVNK is leveraging LB templates. This is because the IP for the LB VIP is a string template called "^NODEIP_IPv4_0", which doesnt match the northd check that decides whether or not to skip the unSNAT stage.
- clones
-
FDP-291 OVN routers should unSNAT before attempting to load balance
- Closed
- links to
-
RHBA-2024:129084 ovn23.09 bug fix and enhancement update
- mentioned on