Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2120

CLONE [ovn25.03 fast-datapath-rhel-10] - Upstream: ovn-controller may perform very cpu-heavy computations translating != prefix matches to openflow

    • 2
    • False
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:
      ( ) Unit test or Integration test case are written and pass successfully


      ( ) The upstream pull request is merged upstream and pass CI

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) Unit test or Integration test case are written and pass successfully ( ) The upstream pull request is merged upstream and pass CI
    • ovn25.03-25.03.1-93.el10fdp
    • rhel-10
    • None
    • rhel-net-ovn
    • OVN FDP Sprint 12
    • 1
    • Important
    • +

      This is tracking the upstream effort needed to deliver the solution to the bug described below.


       Problem Description: Clearly explain the issue.

      This is a known problem, due to the lack of support for != in OpenFlow, when translating matches of the form "ip.dst != <prefix>/<mask>" ovn-controller has to generate rules that match on the remainder of the address space, potentially with multiple rules.

      There has been an attempt to improve ovn's behavior:
      https://github.com/ovn-org/ovn/commit/422ab29

      However, in specific scenarios, the additional computations that determine and consolidate supersets of sub-expressions seems to be quite high. E.g., the following ACL makes ovn-controller spend lots of time (tens/hundreds of seconds) in crush_or_supersets():

      $ ovn-nbctl acl-list 0b9bd3cb-410e-4f3a-a9a6-62cac898ec2b
      from-lport  1001 (ip4.dst == 0.0.0.0/0 && ip4.dst != {10.6.153.254/32, 10.0.0.0/8, 10.128.8.1/32, 10.129.0.1/32, 10.128.0.1/32, 10.131.0.1/32, 10.130.0.1/32, 10.130.6.1/32, 10.129.4.1/32, 10.129.2.1/32, 10.130.2.1/32, 10.131.4.1/32, 10.128.6.1/32, 10.131.8.1/32, 10.129.6.1/32, 10.131.6.1/32, 10.128.4.1/32, 10.130.4.1/32, 10.129.8.1/32, 100.64.0.0/10, 129.0.1.4/32, 129.0.1.5/32, 129.0.2.153/32, 144.42.16.0/24, 144.42.27.0/24, 144.42.28.0/24, 144.42.3.0/24, 144.42.34.0/24, 144.42.56.0/24, 169.254.0.0/16, 170.40.0.0/17, 172.16.0.0/12, 144.42.16.0/24, 144.42.27.0/24, 144.42.28.0/24, 144.42.3.0/24, 144.42.34.0/24, 144.42.56.0/24, 152.161.230.196/30, 152.162.200.128/30, 152.181.136.88/30, 152.181.52.4/30, 152.181.57.100/30, 152.181.57.112/30, 152.181.57.92/30, 152.181.58.12/30, 152.181.59.208/30, 152.181.60.44/30, 152.181.60.56/30, 152.183.126.220/30}) allow-related [after-lb]
      

      The goal of the issue is to investigate potential ways of further optimizing ovn-controller's behavior.

      Note: It might be acceptable if ovn-controller generates the same amount of openflow rules as it does today in this case but it would be great if it would use less CPU to compute those, if possible.
       

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      In specific scenarios with OCP ipBlock + except network policies this can cause all networking to be down for the cluster:

      OCPBUGS-61823
       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      All supported ovn streams.
       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      This is not a regression, it's a leftover after https://github.com/ovn-org/ovn/commit/422ab29
       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      An even simpler way of reproducing this is to use ovstest:

      echo 'ip4.dst != 152.183.126.220/30 && ip4.dst != 162.213.152.0/23 && ip4.dst != 165.79.15.0/24 && ip4.dst != 165.79.232.16/28 && ip4.dst != 165.79.254.0/24 && ip4.dst != 165.79.3.0/24 && ip4.dst != 169
      .254.0.0/16 && ip4.dst != 170.40.0.0/17 && ip4.dst != 172.16.0.0/12 && ip4.dst != 191.237.22.167/32 && ip4.dst != 192.168.0.0/16 && ip4.dst != 198.105.201.0/24' | ./tests/ovstest test-ovn expr-to-flows
      

       

       Expected Behavior: Describe what should happen under normal circumstances.

      Hopefully ovn-controller can be optimized to spend less time consolidating supersets of expressions.
       

       Observed Behavior: Explain what actually happens.

      ovn-controller spends lots of time in this case in crush_or_supersets() (hundreds of seconds in total).
       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)


              ovn-qe OVN QE
              ovnteam@redhat.com OVN Team
              Aniss Loughlam Aniss Loughlam
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: