Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1766

CLONE [ovn26.03 fast-datapath-rhel-10] - Too many ACL sampled packets when traffic traverses logical routers

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • ovn26.03
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given stateful ACLs with "sample_est" are applied on logical switches that are linked by an OVN logical router,

      When a packet from an established flow crosses that router,

      Then the packet is sampled once per ACL only (no duplicate samples appear because of the router hop).

      Show
      Given stateful ACLs with "sample_est" are applied on logical switches that are linked by an OVN logical router, When a packet from an established flow crosses that router, Then the packet is sampled once per ACL only (no duplicate samples appear because of the router hop).
    • ovn26.03-26.03.0-alpha.69.el10fdp
    • rhel-10
    • None
    • rhel-net-ovn
    • ssg_networking

       Problem Description:

      Originally reported upstream: https://mail.openvswitch.org/pipermail/ovs-discuss/2025-May/053626.html

      With topologies similar to the following:

      switch afc9bff5-0b20-43c7-8ef7-9bc65e9a19ee (ls2)
          port ls2-rtr
              type: router
              router-port: rtr-ls2
          port vm3
              addresses: ["00:00:00:00:00:03"]
      switch c9c171ef-849c-436d-b3f9-73d83b9c4e5d (ls)
          port ls-rtr
              type: router
              addresses: ["00:00:00:00:01:00"]
              router-port: rtr-ls
          port vm1
              addresses: ["00:00:00:00:00:01"]
      router cd90bc9c-5ed1-403f-a3c7-1923ef069360 (rtr)
          port rtr-ls
              mac: "00:00:00:00:01:00"
              ipv6-lla: "fe80::200:ff:fe00:100"
              networks: ["42.42.42.1/24", "4242::1/64"]
          port rtr-ls2
              mac: "00:00:00:00:02:00"
              ipv6-lla: "fe80::200:ff:fe00:200"
              networks: ["43.43.43.1/24"] 

      E.g., two logical switches connected together by a logical router.

      And the following ACLs applied to both switches:

      # ovn-nbctl acl-list pg
      from-lport   100 (inport==@pg && ip4) allow-related
        to-lport   200 (outport==@pg && ip4 && icmp4) allow-related 

      If sampling of packets that are part of established sessions is configured for both ACLs:

      # ovn-nbctl list sample
      _uuid               : 23153fae-0a73-4f86-bdf2-137e76647da8
      collectors          : [82540855-dcd4-44e4-8354-e08a972500cd]
      metadata            : 2000000_uuid               : 42391c82-23d2-4f2b-a7b9-88afaa68282c
      collectors          : [82540855-dcd4-44e4-8354-e08a972500cd]
      metadata            : 1000000
      # ovn-nbctl --columns action,direction,match,sample_new,sample_est list acl 
      action              : allow-related
      direction           : from-lport
      match               : "inport==@pg && ip4"
      sample_new          : []
      sample_est          : 23153fae-0a73-4f86-bdf2-137e76647da8action              : allow-related
      direction           : to-lport
      match               : "outport==@pg && ip4 && icmp4"
      sample_new          : []
      sample_est          : 42391c82-23d2-4f2b-a7b9-88afaa68282c 

      Reply packets on established sessions are sampled twice:

      • once in the egress pipeline of sw1 (towards the router)
      • once in the egress pipeline of sw2 (towards the destination)

      The first sample should not happen because router ports are skipped from conntrack so in practice there should be no conntrack entry associated to the packet in the egress pipeline (from switch to router).  However, the function skip_port_from_conntrack() does not clear CT information associated with the packet in the ingress pipeline if the switch has stateful ACLs.  That's the case since https://github.com/ovn-org/ovn/commit/d17ece7:

      Also, this patch does not change the behavior for ACLs such as allow-related:
      packets are still sent to conntrack, even for router ports. While this does
      not work if router ports are distributed, allow-related ACLs work today on
      router ports when those ports are handled on the same chassis for ingress and
      egress traffic. This patch does not change that behavior. 

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      Any OVN24.03 or newer.

        Issue Type: this is a day-on issue

       

       Reproducibility: this can be consistently reproduced

       

       Reproduction Steps: with the attached configuration, ping once from "vm3":

      ip netns exec vm3 ping 42.42.42.2 -c 1 

       Expected Behavior:

      There should be two samples of the ICMP reply packet (one for each ACL).

       

       Observed Behavior:

      Three samples are actually generated.

              lorenzobianconi lorenzo bianconi
              ovnteam@redhat.com OVN Team
              Ehsan Elahi Ehsan Elahi
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: