Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34812

cgroupsv2: failed to write on cpuset.cpus.exclusive

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • 4.16.z
    • 4.16.0
    • Node / CRI-O
    • None
    • -
    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, when using CPU load balancing on cgroupv2, a pod can fail to start if another pod that has access to exclusive CPUs already exists. This can happen when a pod is deleted and another one is quickly created to replace it. With this update, the container runtime ensures that the CPUs from the old cpuset can be reassigned to the new cgroup after the old cgroup is deleted. As a result, deleted pods now release exclusive CPUs as expected before a new pod is created. (link:https://issues.redhat.com/browse/OCPBUGS-34812[*OCPBUGS-34812*])
      Show
      Previously, when using CPU load balancing on cgroupv2, a pod can fail to start if another pod that has access to exclusive CPUs already exists. This can happen when a pod is deleted and another one is quickly created to replace it. With this update, the container runtime ensures that the CPUs from the old cpuset can be reassigned to the new cgroup after the old cgroup is deleted. As a result, deleted pods now release exclusive CPUs as expected before a new pod is created. (link: https://issues.redhat.com/browse/OCPBUGS-34812 [* OCPBUGS-34812 *])
    • Bug Fix
    • Done

      Description of problem:

          When deploying a workload that makes use of cpuset.cpus.exclusive, it fails

      Version-Release number of selected component (if applicable):

          4.16.0-rc.1

      How reproducible:

          Happens always with this particular workload using helm chart

      Steps to Reproduce:

          1.Deploy the workload by using the customer helm chart
      
          

      Actual results:

          Error: failed to run pre-start hook for container "ctr-vxx-l2hi": set CPU load balancing: failed to write "2-6,34-38": write /sys/fs/cgroup/kubepods.slice/kubepods-pod6f8070d4_7d94_4d32_9000_b71d916b1263.slice/cpuset.cpus.exclusive: invalid argument

      Expected results:

          Pod successfully deployed

      Additional info:

      This config does not make use of the RT kernel although they use NTO and telco profile
      
      [labadmin@TestSrv ~]$ oc debug node/master0.sno2.xxx.fx.nsn-rdnet.net -- uname -r
      Starting pod/master0sno2xxx-rdnetnet-debug ...
      To use host binaries, run `chroot /host`
      5.14.0-427.13.1.el9_4.x86_64
      
      Removing debug pod ...
      
      
      Other relevant link:
      
      This slack conversation: https://redhat-internal.slack.com/archives/CQNBUEVM2/p1717400901646359?thread_ts=1712824698.884209&cid=CQNBUEVM2
      Logs: https://drive.google.com/drive/folders/1gNQS45-NkLMC4gBPECaOmITgKsIwLoy_?usp=sharing 

              pehunt@redhat.com Peter Hunt
              cdonato@redhat.com Carlos Donato
              Sunil Choudhary Sunil Choudhary
              Ronan Hennessy Ronan Hennessy
              Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              26 Start watching this issue

                Created:
                Updated: