Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20365

[4.13] All burstable pods run with the reserved cpu affinity mask when PerformanceProfile is applied

XMLWordPrintable

    • Critical
    • No
    • CNF Compute Sprint 244, CNF Compute Sprint 245
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Issue:

      Burstable containers run on reserved cpus only on nodes configured using a Performance Profile.

      Cause:

      RHEL 9 changed the behavior of cpu affinity vs. cpuset and does not reset the affinity anymore when cpuset is changed.

      Fix:

      All components that are touching cpusets of newly started containers were updated to explicity reset the cpu affinity.

      Result:

      Burstable containers now have access to all cpus not being currently assigned to pinned guaranteed containers.
      Show
      Issue: Burstable containers run on reserved cpus only on nodes configured using a Performance Profile. Cause: RHEL 9 changed the behavior of cpu affinity vs. cpuset and does not reset the affinity anymore when cpuset is changed. Fix: All components that are touching cpusets of newly started containers were updated to explicity reset the cpu affinity. Result: Burstable containers now have access to all cpus not being currently assigned to pinned guaranteed containers.
    • Hide
      10/24: pending builds to test against in 4.13.z context ; Yellow til timing better understood
      9/26: crun/runc patches actively worked on by node team - new PR posted
      9/5: 9.3 kernel API change merged, waiting on QE to allow the 9.2 backport process to start
      8/21: pending on fix for RHELPLAN-161539, which has RHEL 9 merge requests up
      7/25: workaround being used successfully, QE is unblocked & TCFP; full solution pending RHELPLAN-161539
      Show
      10/24: pending builds to test against in 4.13.z context ; Yellow til timing better understood 9/26: crun/runc patches actively worked on by node team - new PR posted 9/5: 9.3 kernel API change merged, waiting on QE to allow the 9.2 backport process to start 8/21: pending on fix for RHELPLAN-161539, which has RHEL 9 merge requests up 7/25: workaround being used successfully, QE is unblocked & TCFP; full solution pending RHELPLAN-161539

      Description of problem:

      All burstable pods run on cpus specified as reserved in the PerformanceProfile. This is really caused by the *systemd.cpu_affinity=<reserved>* kernel argument which is applied via the PerfProfile.
      
      This is problematic, because reserved can be just 1 - 4 cpus and the node capacity will allow many dozens pods to be crammed there. Not speaking about the infrastructure components...
      
      All pods should really run in the "isolated" space.

      Version-Release number of selected component (if applicable):

      4.14 CI builds of OCP as of today and yesterday (2023-06-15+) at least.

      How reproducible:

      Always

      Steps to Reproduce:

      1. oc debug node/<worker>
      2. Find any burstable container process and run `taskset -pc <pid>`
      3. Observe the cpu affinity contains all cpus on the node
      
      4. Add a kernel argument systemd.cpu_affinity=0 (example MachineConfig is attached)
      5. oc debug node/<worker>
      6. Find any burstable container process again and run `taskset -pc <pid>`
      7. Observe the cpu affinity again
      

      Actual results:

      Step 7. affinity matches the kernel argument value from step 4.

      Expected results:

      The step 7. affinity matches the affinity from steps 2-3.

      Additional info:

      The cgroup of the burstable pod is set up properly and contains all cpus (/sys/fs/cgroups/cpuset/kubepods.slice/.../cpuset.cpus) as expected.

            msivak@redhat.com Martin Sivak
            msivak@redhat.com Martin Sivak
            Shereen Haj Shereen Haj
            Peter Hunt
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: