Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-3124

"ACL with 'pass' action + lower priority 'drop' ACL: stateful tracking does not apply after LB DNAT"

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • None
    • ovn25.09
    • None
    • False
    • False
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking

       Problem Description: Clearly explain the issue.

      https://ovn-kubernetes.io/okeps/okep-5224-connecting-udns/okep-5224-connecting-udns/#services

      is being implemented here: https://github.com/ovn-kubernetes/ovn-kubernetes/pull/5828 

      As part of this feature, we are adding hierarchical ACLs,

      3 ACLs:

      • one at tier0 that let's destination serviceCIDR be `pass` at priority 500
      • one at tier0 that lets destination being same network be a `pass` at priority 475
      • another is a drop ACL at 450 if its to network cidrs in the address-set as part of the match also in tier0

      example ACLs:

      _uuid               : 2ba0a82a-399d-4caf-af5e-0a3ab359361d
      action              : drop
      direction           : from-lport
      external_ids        : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:drop-pod", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=drop-pod}
      label               : 0
      log                 : false
      match               : "ip4.dst == $a10198858957931958511 || ip6.dst == $a10198861156955214933"
      meter               : acl-logging
      name                : []
      options             : {}
      priority            : 450
      sample_est          : []
      sample_new          : []
      severity            : []
      tier                : 0

      ===

      _uuid               : aafe9da6-cc2e-4179-9269-3ec11382960d
      action              : pass
      direction           : from-lport
      external_ids        : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:allow-same-network-15", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=allow-same-network-15}
      label               : 0
      log                 : false
      match               : "ip4.dst == 10.128.0.0/16 || ip6.dst == 2014:100:200::/60"
      meter               : acl-logging
      name                : []
      options             : {}
      priority            : 475
      sample_est          : []
      sample_new          : []
      severity            : []
      tier                : 0

      ===

      _uuid               : 81d6582f-b56d-4e4a-8410-cbf7e7bf7fdf
      action              : pass
      direction           : from-lport
      external_ids        : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:allow-service", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=allow-service}
      label               : 0
      log                 : false
      match               : "(ip4.dst == 10.96.0.0/16 || ip6.dst == fd00:10:96::/112)"
      meter               : acl-logging
      name                : []
      options             : {}
      priority            : 500
      sample_est          : []
      sample_new          : []
      severity            : []
      tier                : 0

      ==

      the expectation is when we do a curl to a service it should work - but it doesn't work today. When we add `ct.new` state to the ACL match for the drop ACL things work as expected but ct.new shouldn't be required at all.

      The service VIPs are being dropped by the drop ACL because without ct.new, the drop matches ALL traffic to the connected subnets, including DNAT'd return traffic from service VIPs.The flow is:

      • Pod sends to service VIP (e.g., 10.96.x.x) → allow-service ACL passes it (priority 500)
      • OVN LB DNATs the VIP to the backend pod IP (e.g., 10.128.x.x)
      • The DNAT'd packet now has dst == 10.128.x.x (a pod subnet)  but ACL won't be applied here since apply-after-lb is set to false
      • reply should come the same path

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      workaround is :

      _uuid               : 2ba0a82a-399d-4caf-af5e-0a3ab359361d
      action              : drop
      direction           : from-lport
      external_ids        : {"k8s.ovn.org/id"="ovnkube-network-connect-controller:ClusterNetworkConnect:test-cnc-mg5kq:drop-pod", "k8s.ovn.org/name"=test-cnc-mg5kq, "k8s.ovn.org/owner-controller"=ovnkube-network-connect-controller, "k8s.ovn.org/owner-type"=ClusterNetworkConnect, type=drop-pod}
      label               : 0
      log                 : false
      match               : "ip4.dst == $a10198858957931958511 || ip6.dst == $a10198861156955214933" && ct.new
      meter               : acl-logging
      name                : []
      options             : {}
      priority            : 450
      sample_est          : []
      sample_new          : []
      severity            : []
      tier                : 0

      $ oc rsh -n black-ns0-4xzlx black-pod-0
      / $ curl http://10.96.88.94:8080/hostname --max-time 5
      curl: (28) Operation timed out after 5002 milliseconds with 0 bytes received
      / $ curl http://10.96.88.94:8080/hostname --max-time 5
      curl: (28) Operation timed out after 5001 milliseconds with 0 bytes received

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      25.09

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

       

       Expected Behavior: Describe what should happen under normal circumstances.

       

       Observed Behavior: Explain what actually happens.

       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              sseethar Surya Seetharaman
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: