Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17792

Cpu used by pods with cpu load balancing disabled annotation when deleted are not part of scheduling domains

XMLWordPrintable

    • Important
    • No
    • CNF Compute Sprint 242, CNF Compute Sprint 243, CNF Compute Sprint 244, CNF Compute Sprint 245
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool.
      +
      This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link:https://issues.redhat.com/browse/OCPBUGS-17792[*OCPBUGS-17792*])
      Show
      * Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool. + This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link: https://issues.redhat.com/browse/OCPBUGS-17792 [* OCPBUGS-17792 *])
    • Known Issue
    • Done
    • Hide
      2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions.
      2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely.
      2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!








      Show
      2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions. 2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely. 2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!

      Description of problem:

      Current when the Guaranteed pods is started with cpu load balancing is disabled the cpus allocated are removed from joining any scheduling domains. but after the pod is deleted, the cpus do not go back to it's original configuration
       

      Version-Release number of selected component (if applicable):

      4.14.0-0.ci-2023-08-11-000617
       

      How reproducible:

      Not everytime but does happen
       

      Steps to Reproduce:

      1. Create a guaranteed(gu) pod with cpu-load-balancing.crio.io: "disable" annotation
      2. After the pods is running state, check the cpus alloted to the pod
      3. Check /proc/schedstat and verify the cpus used by gu pod are not part of scheduling domains
      4. Delete the pod
      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
      

      Actual results:

      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset. This doesn't happen.
       

      Expected results:

      5. Verify /proc/schedstat is updated i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
       

      Additional info:

       

              fromani@redhat.com Francesco Romani
              mniranja Mallapadi Niranjan
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated: