-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
8
-
False
-
-
False
-
FDP 24.D, FDP 24.E
-
Customer Escalated
when processing large address_sets referenced in ACL in such fashion
match : "ip4.src == {$a6538357531468486505, $a7890555686633816021} && outport == @a11713828714540761946"
(with multiple address_set, here with a6538357531468486505 containing 3754 addresses, a7890555686633816021 containing 2 addresses),
when updating the big address_set, ovn-controller hangs and logs
2024-03-13T21:21:40.606Z|6040765|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 1563ms 2024-03-13T21:21:40.853Z|6040766|timeval|WARN|Unreasonably long 1816ms poll interval (1795ms user, 3ms system) 2024-03-13T21:21:40.853Z|6040767|timeval|WARN|faults: 672 minor, 0 major 2024-03-13T21:21:40.853Z|6040768|timeval|WARN|context switches: 0 voluntary, 41 involuntary
in practice on a big ovn-kubernetes deployment with networkpolicies using labels that select a lot of pods, ovn-controller is always busy with 100% CPU.
This only seems to happen when using a set of address_sets, just using ip4.src == {$a6538357531468486505} with a very big address_set, seems way faster.
A reproducer:
# rpm -q --whatprovides /usr/bin/ovn-controller ovn23.09-host-23.09.0-91.el9fdp.x86_64
#!/bin/bash # a large list of addresses echo -n "[" > a for i in `seq 70` ; do for j in `seq 70` ; do echo -n "\"10.0.${i}.${j}\", " >> a done done echo -n " \"10.0.255.255\"]" >> a sed -e 's/.10.0.42.42..//' a > b set -x systemctl stop openvswitch systemctl stop ovn-controller systemctl stop ovn-northd rm /etc/openvswitch/conf.db rm /var/lib/ovn/ovnsb_db.db rm /var/lib/ovn/ovnnb_db.db ip netns del pods set -e systemctl start ovn-northd ovn-sbctl --inactivity-probe=0 set-connection ptcp:6642 ovn-nbctl --wait=hv sync ovn-nbctl ls-add ls1 \ -- create address_set name=address addresses='[]' \ -- create address_set name=address2 addresses='["1.2.3.4", "5.6.7.8"]' for i in `seq 10 99` ; do ovn-nbctl lsp-add ls1 ls1p$i \ -- pg-add pg$i ls1p$i \ -- acl-add pg$i to-lport 1000 "ip4.src=={\$address, \$address2} && outport == @pg${i}" drop \ -- lsp-set-addresses ls1p${i} "00:00:00:01:01:${i} 192.168.1.${i}" \ -- set logical_switch_port ls1p${i} options:requested-chassis=$(cat /etc/openvswitch/system-id.conf) done systemctl start openvswitch systemctl start ovn-controller ovs-vsctl set open . external_ids:ovn-remote=tcp:127.0.0.1:6642 \ external_ids:ovn-encap-type=geneve \ external_ids:ovn-encap-ip=10.224.123.2 \ external_ids:ovn-openflow-probe-interval=240 node=ls1 ovn-sbctl wait-until chassis `cat /etc/openvswitch/system-id.conf` ovs-vsctl wait-until port br-int ovn-nbctl --wait=hv sync ip netns add pods for i in `seq 10 99`; do ovs-vsctl add-port br-int pod$i -- set interface pod$i type=internal -- set interface pod$i external_ids:iface-id=ls1p$i A=$(ovn-nbctl --bare --columns addresses list logical_switch_port ls1p${i}) ip link set pod$i netns pods mac=${A% *} ip=${A#* } ip netns exec pods ip link set pod$i address ${mac} ip netns exec pods ip link set pod$i up ip netns exec pods ip addr add ${ip} dev pod$i done time ovs-vsctl wait-until interface pod99 external_ids:ovn-installed=true for i in `seq 15` ; do ovn-nbctl set address_set address "addresses=`cat a`" && ovn-nbctl --wait=hv sync ovn-nbctl set address_set address "addresses=`cat b`" && time ovn-nbctl --wait=hv sync done
expected result:
update of address_sets are instant
actual result:
slow and those logs in ovn-controller.log:
2024-03-14T04:46:14.546Z|00530|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 896ms 2024-03-14T04:46:14.789Z|00531|timeval|WARN|Unreasonably long 1142ms poll interval (1133ms user, 2ms system) 2024-03-14T04:46:14.789Z|00532|timeval|WARN|disk: 0 reads, 8 writes 2024-03-14T04:46:14.789Z|00533|timeval|WARN|context switches: 0 voluntary, 2 involuntary 2024-03-14T04:46:15.845Z|00534|memory_trim|INFO|Detected inactivity (last active 30065 ms ago): trimming memory 2024-03-14T04:46:15.857Z|00535|timeval|WARN|Unreasonably long 1030ms poll interval (1013ms user, 11ms system) 2024-03-14T04:46:15.857Z|00536|timeval|WARN|context switches: 0 voluntary, 2 involuntary 2024-03-14T04:46:16.782Z|00537|inc_proc_eng|INFO|Dropped 1 log messages in last 1 seconds (most recently, 1 seconds ago) due to excessive rate 2024-03-14T04:46:16.782Z|00538|inc_proc_eng|INFO|node: logical_flow_output, handler for input addr_sets took 901ms 2024-03-14T04:46:17.012Z|00539|timeval|WARN|Unreasonably long 1133ms poll interval (1081ms user, 44ms system)
Discovered while investigation CNI issue with ovn-kubernetes in case 03746492
- links to
-
RHBA-2024:133417 ovn23.09 bug fix and enhancement update