Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9053

udev rules generated by sriov-network-operator are too generic

XMLWordPrintable

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • Important
    • None
    • x86_64
    • None
    • None
    • Rejected
    • None
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      Description of problem:
      sriov-network-operator needs to control device names of NICs it manages. To do so it generates udev rules that assign device name based on the output of the script provided by the operator.

      Rules are generated here,
      https://github.com/openshift/sriov-network-operator/blob/e969e12f27ab2bcde2a8ef654ee800013d86cd91/pkg/daemon/daemon.go#L993

      The problem with above code is that generated rules are too generic. They are executed every time new network device appears. Very likely they get run hundreds of times in cases when worker node starts a lot of PODs, e.g. after rebooting the node.

      What makes situation even worse is that because of deficiencies in kernel locking scheme wrt. to rtnl_lock, all workers compete for single lock and kernel doesn't use sleep locking but restarts the syscall (read()) when the worker is unable to take the contended lock. This causes unnecessarily high CPU usage and makes recovery of rebooted node slower than it needs to be.

      Version-Release number of selected component (if applicable):
      OCP 4.8 but same issue also exists in the latest upstream.

      How reproducible:
      deterministic

      Steps to Reproduce:
      1. Deploy the and configure the operator
      2. Run debug container on the worker node
      3. Inspect generated rules in /etc/udev/rules.d/20-switchdev.rules

      Actual results:
      Rules in /etc/udev/rules.d/20-switchdev.rules start as follows,

      SUBSYSTEM=="net", ACTION=="add|move", ATTR

      {phys_port_name}==...

      Expected results:
      Rules should be run only for devices we are interested in. This could be achieved for example using driver match.

      Note that network interface device doesn't have necessary driver association but parent physical device (e.g. parent PCI device) has it. So we would need to use DRIVERS= (notice S) match token that matches upwards the devpath. However, due to internal udev implementation detail, plural rule tokens are process after "singular" tokens, i.e. ATTR < DRIVERS. To workaround this and get the desired benefit, i.e. read sysfs attributes like phys_port_name only for devices we are interested in we would need to change generated rules as follows,

      ACTION!="add|move", GOTO="switchdev_end"
      SUBSYSTEM!="net", GOTO="switchdev_end"
      DRIVERS=="...", ENV{.DRIVER_MATCH}="1"
      ENV{.DRIVER_MATCH}=="1", ATTR{phys_port_name}

      ==...
      LABEL="switchdev_end"

      Additional info:
      For more info on kernel locking issue see comments in, https://bugzilla.redhat.com/show_bug.cgi?id=1975356

              bnemeth@redhat.com Balazs Nemeth
              msekleta@redhat.com Michal Sekletar
              None
              None
              Zhanqi Zhao Zhanqi Zhao
              None
              Red Hat Employee
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: