Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-585

Tuned overwriting IRQBALANCE_BANNED_CPUS

XMLWordPrintable

    • Important
    • None
    • False
    • Hide

      None

      Show
      None
    • Hide
      previously, restarts of the `tuned` service caused improper reset of the irqbalance configuration, leading to IRQ operation being served again on the isolated CPUs, thus violating the isolation guarantees. With this fix, the irqbalance service configuration is properly preserved across `tuned` service restarts (explicit or caused by bugs), so preserving the cpu isolation guarantees with respect IRQ serving.
      Show
      previously, restarts of the `tuned` service caused improper reset of the irqbalance configuration, leading to IRQ operation being served again on the isolated CPUs, thus violating the isolation guarantees. With this fix, the irqbalance service configuration is properly preserved across `tuned` service restarts (explicit or caused by bugs), so preserving the cpu isolation guarantees with respect IRQ serving.
    • Done

      Clone of https://bugzilla.redhat.com/show_bug.cgi?id=2105123 / https://issues.redhat.com/browse/OCPBUGSM-46474

      Description of problem:
      If a pod specifies the performance runtime class with the irq-load-balancing.crio.io: "disable" annotation set the runtime will dynamically add the cpus assigned to guaranteed containers to the IRQBALANCE_BANNED_CPUS to move interrupts off these cpus.

      I am seeing an intermittent issue where this mask is set to zero even though the
      runtime hooks were successfully executed after a SNO reboot.

      In my case I had two containers in different pods that used this mechanism

      Jul 07 23:39:05 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:05.163537372Z" level=info msg="Started container" PID=37262 containerID=bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce description=default/oslat1/container-perf-tools id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer sandboxID=0ede207f09e8e6a343b69e843cc6460ab26438e2d5979c37352e860e449b5739
      Jul 07 23:39:04 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:04.863831900Z" level=info msg="Disable cpu cfs quota for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.871635816Z" level=info msg="Disable irq smp balancing for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.861160239Z" level=info msg="Disable cpu load balancing for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.861031262Z" level=info msg="Run \"high-performance\" runtime handler pre-start hook for the container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.860731655Z" level=info msg="Starting container: bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer

      2nd container
      Jul 07 23:39:05 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:05.163537372Z" level=info msg="Started container" PID=37262 containerID=bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce description=default/oslat1/container-perf-tools id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer sandboxID=0ede207f09e8e6a343b69e843cc6460ab26438e2d5979c37352e860e449b5739
      Jul 07 23:39:04 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:04.863831900Z" level=info msg="Disable cpu cfs quota for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.871635816Z" level=info msg="Disable irq smp balancing for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.861160239Z" level=info msg="Disable cpu load balancing for container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.861031262Z" level=info msg="Run \"high-performance\" runtime handler pre-start hook for the container \"bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce\"" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.860731655Z" level=info msg="Starting container: bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce" id=ffc52721-9968-49bf-b7f0-3578c2f29ed7 name=/runtime.v1.RuntimeService/StartContainer
      Jul 07 23:39:03 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com bash[5776]: time="2022-07-07 23:39:03.849790627Z" level=info msg="Created container bee5ff080411de17faac050926209219050fee3c5323100636c3b2983cfdffce: default/oslat1/container-perf-tools" id=f86d8b9f-749d-4ac7-b47d-44798afdadbc name=/runtime.v1.RuntimeService/CreateContainer

      However

      cat /etc/sysconfig/irqbalance |grep IRQBALANCE_BANNED_CPUS

      IRQBALANCE_BANNED_CPUS=00000000

      If I look at when the file was updated, it is after the hooks above ran

      stat /etc/sysconfig/irqbalance
      File: /etc/sysconfig/irqbalance
      Size: 1216 Blocks: 8 IO Block: 4096 regular file
      Device: 804h/2052d Inode: 511731398 Links: 1
      Access: (0644/rw-rr-) Uid: ( 0/ root) Gid: ( 0/ root)
      Context: system_u:object_r:etc_t:s0
      Access: 2022-07-07 23:40:14.736990599 +0000
      Modify: 2022-07-07 23:40:14.692990600 +0000
      Change: 2022-07-07 23:40:14.692990600 +0000
      Birth: 2022-07-05 11:45:48.743532701 +0000

      systemctl status irqbalance.service
      ● irqbalance.service - irqbalance daemon
      Loaded: loaded (/usr/lib/systemd/system/irqbalance.service; enabled; vendor preset: enabled)
      Active: active (running) since Thu 2022-07-07 23:40:14 UTC; 27min ago
      Main PID: 50761 (irqbalance)
      Tasks: 2 (limit: 2254814)
      Memory: 1.7M
      CPU: 2.701s
      CGroup: /system.slice/irqbalance.service
      └─50761 /usr/sbin/irqbalance --foreground

      Jul 07 23:40:14 worker-1.cluster1.savanna.lab.eng.rdu2.redhat.com systemd[1]: Started irqbalance daemon.

      Then looking at when tuned applied the performance profile.

      2022-07-07 23:40:08,153 INFO tuned.daemon.daemon: starting tuning
      ....
      2022-07-07 23:40:11,683 INFO tuned.plugins.plugin_sysctl: reapplying system sysctl
      E0707 23:40:14.503688 10464 controller.go:872] unable to sync(daemon/) requeued (7)
      2022-07-07 23:40:14,688 WARNING tuned.plugins.plugin_vm: Option 'transparent_hugepages' is not supported on current hardware.
      2022-07-07 23:40:14,690 INFO tuned.plugins.plugin_systemd: setting 'CPUAffinity' to '0 1 40 41' in the '/etc/systemd/system.conf'
      2022-07-07 23:40:17,829 INFO tuned.plugins.plugin_script: calling script '/usr/lib/tuned/cpu-partitioning/script.sh' with arguments '['start']'
      2022-07-07 23:40:18,110 INFO tuned.plugins.plugin_bootloader: installing additional boot command line parameters to grub2
      2022-07-07 23:40:18,111 INFO tuned.plugins.plugin_bootloader: cannot find grub.cfg to patch
      E0707 23:40:18.114411 10464 controller.go:872] unable to sync(daemon/) requeued (8)
      E0707 23:40:18.114687 10464 controller.go:872] unable to sync(daemon/) requeued (9)
      2022-07-07 23:40:18,115 INFO tuned.daemon.daemon: static tuning from profile 'performance-patch' applied

      The performance profile will zero out the mask on startup.

      [irqbalance]
      #> Override the value set by cpu-partitioning with an empty one
      banned_cpus=""

      So we have a race condition here, if the performance profile gets applied after the app container has started, the interrupt affinity is broken.

      This would also happen at steady state if tuned pod got restarted for any reason.

      This is a critical issue as it breaks interrupt isolation from guaranteed cpus.

      Version-Release number of selected component (if applicable):
      4.11 but I suspect this goes all the way back to 4.8

      How reproducible:
      Intermittent, depends on the order the app pod starts vs the profile being applied

      Steps to Reproduce:
      1. Create a pod with the runtime class, irq-load-balancing.crio.io: "disable" annotation set and a container with guarnteed cpus
      2. Reboot the node, reproduction is intermittent or
      3. Simply do step 1 and restart the tuned pod

      Actual results:

      Expected results:

      Additional info:

              msivak@redhat.com Martin Sivak
              msivak@redhat.com Martin Sivak
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: