-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.14.z
-
Important
-
None
-
False
-
Description of problem:
We are experiencing a flow processing slowness on OVN component.
Ovn contoller logs warning about Unreasonably poll interval
2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34418|timeval|WARN|faults: 7244 minor, 0 major 2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34419|timeval|WARN|context switches: 0 voluntary, 130 involuntary 2024-11-21T15:03:02.414584307Z 2024-11-21T15:03:02.414Z|34426|timeval|WARN|Unreasonably long 6586ms poll interval (6522ms user, 6ms system)
It also shows long time to proceed with addr_sets
2024-11-20T13:15:49.639Z|04506|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5718ms 2024-11-20T12:16:48.853Z|01458|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5674ms 2024-11-21T19:08:34.969Z|30044|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5168ms
and consums lot of CPU
2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34408|timeval|WARN|faults: 6118 minor, 0 major 2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34409|timeval|WARN|context switches: 0 voluntary, 166 involuntary 2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34410|poll_loop|INFO|Dropped 151 log messages in last 25 seconds (most recently, 24 seconds ago) due to excessive rate 2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34411|poll_loop|INFO|wakeup due to [POLLIN] on fd 23 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (100% CPU usage) 2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34412|poll_loop|INFO|wakeup due to [POLLIN][POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage) 2024-11-21T15:01:57.323090184Z 2024-11-21T15:01:57.323Z|34413|poll_loop|INFO|wakeup due to [POLLIN] on fd 24 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU usage) 2024-11-21T15:01:57.327049793Z 2024-11-21T15:01:57.327Z|34414|poll_loop|INFO|wakeup due to [POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage)
Version-Release number of selected component (if applicable):
Openshift 4.14.40 (23.09.4-16.el9fdp)
We are suspecting a scaling limit. Cluster is running 11k with 9k pods matching the same network policy label(s) leading to creates very large address_set.
How reproducible:
Always
Steps to Reproduce:
1. Installs Openshift 4.14.40
2. Creates labels based network policies
3. Creates ans scales pods with the previously set labels to (x)k
Actual results:
At some point OVN starts to raise warn about long delay
Expected results:
OVN able to handle this number of pods/acls/flows
Additional info:
It seems to be a similar to a previously fixed issue FDP-509