-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.8
-
Quality / Stability / Reliability
-
None
-
None
-
None
-
Important
-
None
-
x86_64
-
None
-
None
-
Rejected
-
None
-
None
-
If docs needed, set a value
-
None
-
None
-
None
-
None
-
None
Description of problem:
sriov-network-operator needs to control device names of NICs it manages. To do so it generates udev rules that assign device name based on the output of the script provided by the operator.
Rules are generated here,
https://github.com/openshift/sriov-network-operator/blob/e969e12f27ab2bcde2a8ef654ee800013d86cd91/pkg/daemon/daemon.go#L993
The problem with above code is that generated rules are too generic. They are executed every time new network device appears. Very likely they get run hundreds of times in cases when worker node starts a lot of PODs, e.g. after rebooting the node.
What makes situation even worse is that because of deficiencies in kernel locking scheme wrt. to rtnl_lock, all workers compete for single lock and kernel doesn't use sleep locking but restarts the syscall (read()) when the worker is unable to take the contended lock. This causes unnecessarily high CPU usage and makes recovery of rebooted node slower than it needs to be.
Version-Release number of selected component (if applicable):
OCP 4.8 but same issue also exists in the latest upstream.
How reproducible:
deterministic
Steps to Reproduce:
1. Deploy the and configure the operator
2. Run debug container on the worker node
3. Inspect generated rules in /etc/udev/rules.d/20-switchdev.rules
Actual results:
Rules in /etc/udev/rules.d/20-switchdev.rules start as follows,
SUBSYSTEM=="net", ACTION=="add|move", ATTR
{phys_port_name}==...Expected results:
Rules should be run only for devices we are interested in. This could be achieved for example using driver match.
Note that network interface device doesn't have necessary driver association but parent physical device (e.g. parent PCI device) has it. So we would need to use DRIVERS= (notice S) match token that matches upwards the devpath. However, due to internal udev implementation detail, plural rule tokens are process after "singular" tokens, i.e. ATTR < DRIVERS. To workaround this and get the desired benefit, i.e. read sysfs attributes like phys_port_name only for devices we are interested in we would need to change generated rules as follows,
ACTION!="add|move", GOTO="switchdev_end"
SUBSYSTEM!="net", GOTO="switchdev_end"
DRIVERS=="...", ENV{.DRIVER_MATCH}="1"
ENV{.DRIVER_MATCH}=="1", ATTR{phys_port_name}
==...
LABEL="switchdev_end"
Additional info:
For more info on kernel locking issue see comments in, https://bugzilla.redhat.com/show_bug.cgi?id=1975356