Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45052

OVN slowness when processing address_set

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      We are experiencing a flow processing slowness on OVN component.
      Ovn contoller logs warning about Unreasonably poll interval

      2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34418|timeval|WARN|faults: 7244 minor, 0 major
      2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34419|timeval|WARN|context switches: 0 voluntary, 130 involuntary
      2024-11-21T15:03:02.414584307Z 2024-11-21T15:03:02.414Z|34426|timeval|WARN|Unreasonably long 6586ms poll interval (6522ms user, 6ms system)
      

      It also shows long time to proceed with addr_sets

      2024-11-20T13:15:49.639Z|04506|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5718ms
      2024-11-20T12:16:48.853Z|01458|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5674ms
      2024-11-21T19:08:34.969Z|30044|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5168ms
      

      and consums lot of CPU

      2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34408|timeval|WARN|faults: 6118 minor, 0 major
      2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34409|timeval|WARN|context switches: 0 voluntary, 166 involuntary
      2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34410|poll_loop|INFO|Dropped 151 log messages in last 25 seconds (most recently, 24 seconds ago) due to excessive rate
      2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34411|poll_loop|INFO|wakeup due to [POLLIN] on fd 23 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (100% CPU usage)
      2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34412|poll_loop|INFO|wakeup due to [POLLIN][POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage)
      2024-11-21T15:01:57.323090184Z 2024-11-21T15:01:57.323Z|34413|poll_loop|INFO|wakeup due to [POLLIN] on fd 24 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU usage)
      2024-11-21T15:01:57.327049793Z 2024-11-21T15:01:57.327Z|34414|poll_loop|INFO|wakeup due to [POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage)
      

      Version-Release number of selected component (if applicable):
      Openshift 4.14.40 (23.09.4-16.el9fdp)

      We are suspecting a scaling limit. Cluster is running 11k with 9k pods matching the same network policy label(s) leading to creates very large address_set.

      How reproducible:
      Always

      Steps to Reproduce:
      1. Installs Openshift 4.14.40
      2. Creates labels based network policies
      3. Creates ans scales pods with the previously set labels to (x)k

      Actual results:
      At some point OVN starts to raise warn about long delay

      Expected results:
      OVN able to handle this number of pods/acls/flows

      Additional info:
      It seems to be a similar to a previously fixed issue FDP-509

              npinaeva@redhat.com Nadia Pinaeva
              rh-support-fgrosjea Franck Grosjean
              Anurag Saxena Anurag Saxena
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: