Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17792

Cpu used by pods with cpu load balancing disabled annotation when deleted are not part of scheduling domains

XMLWordPrintable

    • Important
    • No
    • CNF Compute Sprint 242, CNF Compute Sprint 243, CNF Compute Sprint 244, CNF Compute Sprint 245
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool.
      +
      This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link:https://issues.redhat.com/browse/OCPBUGS-17792[*OCPBUGS-17792*])
      Show
      * Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool. + This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link: https://issues.redhat.com/browse/OCPBUGS-17792 [* OCPBUGS-17792 *])
    • Known Issue
    • Done
    • Hide
      20240410: 4.14.z fix waiting for u/s merge to happen, still not moving
      10/31: 4.14.z fix waiting for u/s merge to happen, still not moving
      10/18: added to known issues, 4.14.z fix waiting for u/s merge to happen
      10/10: added to known issues, 4.14.z fix waiting for u/s merge to happen
      10/3: will release note this for 4.14 & pursue fix in 4.14.z. The fix is likely a backport of upstream change.

      Show
      20240410: 4.14.z fix waiting for u/s merge to happen, still not moving 10/31: 4.14.z fix waiting for u/s merge to happen, still not moving 10/18: added to known issues, 4.14.z fix waiting for u/s merge to happen 10/10: added to known issues, 4.14.z fix waiting for u/s merge to happen 10/3: will release note this for 4.14 & pursue fix in 4.14.z. The fix is likely a backport of upstream change.

      Description of problem:

      Current when the Guaranteed pods is started with cpu load balancing is disabled the cpus allocated are removed from joining any scheduling domains. but after the pod is deleted, the cpus do not go back to it's original configuration
       

      Version-Release number of selected component (if applicable):

      4.14.0-0.ci-2023-08-11-000617
       

      How reproducible:

      Not everytime but does happen
       

      Steps to Reproduce:

      1. Create a guaranteed(gu) pod with cpu-load-balancing.crio.io: "disable" annotation
      2. After the pods is running state, check the cpus alloted to the pod
      3. Check /proc/schedstat and verify the cpus used by gu pod are not part of scheduling domains
      4. Delete the pod
      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
      

      Actual results:

      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset. This doesn't happen.
       

      Expected results:

      5. Verify /proc/schedstat is updated i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
       

      Additional info:

       

            fromani@redhat.com Francesco Romani
            mniranja Mallapadi Niranjan
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated: