Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1713

ovn-controller may perform very cpu-heavy computations translating != prefix matches to openflow

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • OVN
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given an ACL in NBDB equivalent to "ip4.dst == 0.0.0.0/0 && ip4.dst != {N prefixes}" with large N,

      When ovn-controller translates it to OpenFlow,

      Then the translation completes within a bounded time without increasing the number of generated OpenFlow rules compared to current behavior.

      Show
      Given an ACL in NBDB equivalent to "ip4.dst == 0.0.0.0/0 && ip4.dst != {N prefixes}" with large N, When ovn-controller translates it to OpenFlow, Then the translation completes within a bounded time without increasing the number of generated OpenFlow rules compared to current behavior.
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking
    • Important

       Problem Description: Clearly explain the issue.

      This is a known problem, due to the lack of support for != in OpenFlow, when translating matches of the form "ip.dst != <prefix>/<mask>" ovn-controller has to generate rules that match on the remainder of the address space, potentially with multiple rules.

      There has been an attempt to improve ovn's behavior:
      https://github.com/ovn-org/ovn/commit/422ab29

      However, in specific scenarios, the additional computations that determine and consolidate supersets of sub-expressions seems to be quite high. E.g., the following ACL makes ovn-controller spend lots of time (tens/hundreds of seconds) in crush_or_supersets():

      $ ovn-nbctl acl-list 0b9bd3cb-410e-4f3a-a9a6-62cac898ec2b
      from-lport  1001 (ip4.dst == 0.0.0.0/0 && ip4.dst != {10.6.153.254/32, 10.0.0.0/8, 10.128.8.1/32, 10.129.0.1/32, 10.128.0.1/32, 10.131.0.1/32, 10.130.0.1/32, 10.130.6.1/32, 10.129.4.1/32, 10.129.2.1/32, 10.130.2.1/32, 10.131.4.1/32, 10.128.6.1/32, 10.131.8.1/32, 10.129.6.1/32, 10.131.6.1/32, 10.128.4.1/32, 10.130.4.1/32, 10.129.8.1/32, 100.64.0.0/10, 129.0.1.4/32, 129.0.1.5/32, 129.0.2.153/32, 144.42.16.0/24, 144.42.27.0/24, 144.42.28.0/24, 144.42.3.0/24, 144.42.34.0/24, 144.42.56.0/24, 169.254.0.0/16, 170.40.0.0/17, 172.16.0.0/12, 144.42.16.0/24, 144.42.27.0/24, 144.42.28.0/24, 144.42.3.0/24, 144.42.34.0/24, 144.42.56.0/24, 152.161.230.196/30, 152.162.200.128/30, 152.181.136.88/30, 152.181.52.4/30, 152.181.57.100/30, 152.181.57.112/30, 152.181.57.92/30, 152.181.58.12/30, 152.181.59.208/30, 152.181.60.44/30, 152.181.60.56/30, 152.183.126.220/30}) allow-related [after-lb]
      

      The goal of the issue is to investigate potential ways of further optimizing ovn-controller's behavior.

      Note: It might be acceptable if ovn-controller generates the same amount of openflow rules as it does today in this case but it would be great if it would use less CPU to compute those, if possible.
       

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      In specific scenarios with OCP ipBlock + except network policies this can cause all networking to be down for the cluster:

      OCPBUGS-61823
       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      All supported ovn streams.
       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      This is not a regression, it's a leftover after https://github.com/ovn-org/ovn/commit/422ab29
       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      An even simpler way of reproducing this is to use ovstest:

      echo 'ip4.dst != 152.183.126.220/30 && ip4.dst != 162.213.152.0/23 && ip4.dst != 165.79.15.0/24 && ip4.dst != 165.79.232.16/28 && ip4.dst != 165.79.254.0/24 && ip4.dst != 165.79.3.0/24 && ip4.dst != 169
      .254.0.0/16 && ip4.dst != 170.40.0.0/17 && ip4.dst != 172.16.0.0/12 && ip4.dst != 191.237.22.167/32 && ip4.dst != 192.168.0.0/16 && ip4.dst != 198.105.201.0/24' | ./tests/ovstest test-ovn expr-to-flows
      

       

       Expected Behavior: Describe what should happen under normal circumstances.

      Hopefully ovn-controller can be optimized to spend less time consolidating supersets of expressions.
       

       Observed Behavior: Explain what actually happens.

      ovn-controller spends lots of time in this case in crush_or_supersets() (hundreds of seconds in total).
       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              dceara@redhat.com Dumitru Ceara
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: