Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Networking / ovn-kubernetes
Labels:
- OVN-Kubernetes
- SDN:OVNK:NetworkPolicy

Severity:
Important
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Review Complete:

Description of problem:
We are experiencing a flow processing slowness on OVN component.
Ovn contoller logs warning about Unreasonably poll interval

2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34418|timeval|WARN|faults: 7244 minor, 0 major
2024-11-21T15:02:55.828110047Z 2024-11-21T15:02:55.828Z|34419|timeval|WARN|context switches: 0 voluntary, 130 involuntary
2024-11-21T15:03:02.414584307Z 2024-11-21T15:03:02.414Z|34426|timeval|WARN|Unreasonably long 6586ms poll interval (6522ms user, 6ms system)

It also shows long time to proceed with addr_sets

2024-11-20T13:15:49.639Z|04506|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5718ms
2024-11-20T12:16:48.853Z|01458|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5674ms
2024-11-21T19:08:34.969Z|30044|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 5168ms

and consums lot of CPU

2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34408|timeval|WARN|faults: 6118 minor, 0 major
2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34409|timeval|WARN|context switches: 0 voluntary, 166 involuntary
2024-11-21T15:01:57.318808175Z 2024-11-21T15:01:57.318Z|34410|poll_loop|INFO|Dropped 151 log messages in last 25 seconds (most recently, 24 seconds ago) due to excessive rate
2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34411|poll_loop|INFO|wakeup due to [POLLIN] on fd 23 (<->/var/run/ovn/ovnsb_db.sock) at lib/stream-fd.c:157 (100% CPU usage)
2024-11-21T15:01:57.318857875Z 2024-11-21T15:01:57.318Z|34412|poll_loop|INFO|wakeup due to [POLLIN][POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage)
2024-11-21T15:01:57.323090184Z 2024-11-21T15:01:57.323Z|34413|poll_loop|INFO|wakeup due to [POLLIN] on fd 24 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:157 (100% CPU usage)
2024-11-21T15:01:57.327049793Z 2024-11-21T15:01:57.327Z|34414|poll_loop|INFO|wakeup due to [POLLOUT] on fd 25 (<->/var/run/openvswitch/br-int.mgmt) at lib/stream-fd.c:153 (100% CPU usage)

Version-Release number of selected component (if applicable):
Openshift 4.14.40 (23.09.4-16.el9fdp)

We are suspecting a scaling limit. Cluster is running 11k with 9k pods matching the same network policy label(s) leading to creates very large address_set.

How reproducible:
Always

Steps to Reproduce:
1. Installs Openshift 4.14.40
2. Creates labels based network policies
3. Creates ans scales pods with the previously set labels to (x)k

Actual results:
At some point OVN starts to raise warn about long delay

Expected results:
OVN able to handle this number of pods/acls/flows

Additional info:
It seems to be a similar to a previously fixed issue FDP-509

is related to

FDP-509 ovn-controller slow update of set of address_sets

Closed

links to

OVN Logs

Assignee:: Nadia Pinaeva

Reporter:: Franck Grosjean

QA Contact:: Anurag Saxena

Votes:: 1 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/11/26 12:33 PM

Updated:: 2025/01/23 12:16 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates