Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1698

ECMP symmetric reply might not work for first packets of egress traffic

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • ovn24.03
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given an OVN logical router LR with ECMP symmetric reply enabled and ≥2 next-hops toward a destination logical switch port (LSP_dst) and the traffic originating from LSP_src,

      When a new egress flow from LSP_src to LSP_dst starts and ECMP membership (count/order of next-hops) changes before conntrack is committed by the return ACK,

      Then all pre-commit packets from LSP_src are forwarded via a stable, single next-hop that matches the post-commit path.

      Show
      Given an OVN logical router LR with ECMP symmetric reply enabled and ≥2 next-hops toward a destination logical switch port (LSP_dst) and the traffic originating from LSP_src, When a new egress flow from LSP_src to LSP_dst starts and ECMP membership (count/order of next-hops) changes before conntrack is committed by the return ACK, Then all pre-commit packets from LSP_src are forwarded via a stable, single next-hop that matches the post-commit path.
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking

       Problem Description: 

      ECMP symmetric reply might not work for the first packets of egress traffic, until an ACK of the first packet is received.

      Supposing we have Alice and Bob, with ECMP symmetric reply enabled between Alice and Bob. Alice has two routes, R1 & R2 towards Bob.
      Ingress traffic, initiated by Bob, is OK: packets from Alice are sent through the same route.

      However egress traffic, initiated from Alice might have some issues:
      Alice => Bob: SYN. 
      Bob => Alice: SYN ACK. Resuts in ct_inv as the SYN was not committed.
      Alice -> Bob: ACK. No ct entry -> route still based on hash.
      Alice => Bob: Packet1. No ct entry -> route still based on hash.
      Bob => Alice: ACK. Connection is committed.
      Alice => Bob: Packet2. ct entry -> uses same route as traffic from Bob.

      Hence, if number of routes (and hence hash) changed between SYN from Alice and ACK of first packet(s) the route might change.

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      Connection tracking would not work for packets sent through a different path.

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      OVN main, ovn-24.03

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Issue exists probably since ecmp symmetric reply was introduced.

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Depends how fast ACK from Bob is sent.

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

       

       Expected Behavior: Describe what should happen under normal circumstances.

      Packets from Alice should not change path when e.g. number of routes changes.

       Observed Behavior: Explain what actually happens.

      Initial Packets from Alice are sent on a route depending of the hash of the flow, and hence might be sent on a different route when the number of routes changes.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              xsimonar@redhat.com Xavier Simonart
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: