-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
running ovn from branch-23.09, with a good amount of ACL, ending up with a big amount of meter objects:
# ovn-sbctl list meter | grep -c _uuid 90017 # ovn-sbctl list meter_band | grep -c _uuid 90017 # ovn-sbctl list address_set | grep -c _uuid 33038 # ovn-sbctl list port_group | grep -c _uuid 816
it takes a long time (10s) for external_ids:ovn-installed=true to be added on an interface when no other operation is done in parallel, and ovn-controller is taking 100% CPU during that time.
CPU is caused by ofctrl_meter_bands_sync method complexity, which itself is called in a loop:
https://github.com/ovn-org/ovn/blob/68acb363cad9932f3cec14bc402c39bd343d024d/controller/ofctrl.c#L2739
and which loops on every item of the meter table, doing a strcmp everytime to locate the right item.
"perf top" at the time:
73.77% libc.so.6 [.] __strcmp_avx2 18.31% ovn-controller [.] ovsdb_idl_next_row 6.63% ovn-controller [.] ofctrl_meter_bands_sync
taking a few core dump during the high CPU usage, shows ovn-controller stuck in ofctrl_meter_bands_sync as well.
Commenting the line 2739 above, and the ovn-installed=true annotation is now added in 0.05s.
This is reproduced with ovn23.09-23.09.0-91.el9fdp.x86_64 and branch-23.09
the problem was initially seen on ovn-kubernetes and is reported in https://access.redhat.com/support/cases/#/case/03746492
due to the usage of egressfirewall, address_set are being changed every few seconds, so ovn-controller is seen always taking CPU on the node.
- clones
-
FDP-399 ovn-controller hangs with a lot of meters
- Closed