Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17792

Cpu used by pods with cpu load balancing disabled annotation when deleted are not part of scheduling domains

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • No
    • Hide
      2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions.
      2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely.
      2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!








      Show
      2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions. 2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely. 2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!
    • None
    • Rejected
    • CNF Compute Sprint 242, CNF Compute Sprint 243, CNF Compute Sprint 244, CNF Compute Sprint 245
    • 4
    • Done
    • Bug Fix
    • Hide
      * Previously, CPUs for the last guaranteed pod admitted to a node remained allocated after the pod was deleted. As a consequence, this caused scheduling domain inconsistencies. With this release, CPUs allocated to guaranteed pods return to the pool of available CPU resources as expected, ensuring correct CPU scheduling for subsequent pods. (link:https://issues.redhat.com/browse/OCPBUGS-17792[OCPBUGS-17792])
      Show
      * Previously, CPUs for the last guaranteed pod admitted to a node remained allocated after the pod was deleted. As a consequence, this caused scheduling domain inconsistencies. With this release, CPUs allocated to guaranteed pods return to the pool of available CPU resources as expected, ensuring correct CPU scheduling for subsequent pods. (link: https://issues.redhat.com/browse/OCPBUGS-17792 [ OCPBUGS-17792 ])
    • None
    • None
    • None
    • None

      Description of problem:

      Current when the Guaranteed pods is started with cpu load balancing is disabled the cpus allocated are removed from joining any scheduling domains. but after the pod is deleted, the cpus do not go back to it's original configuration
       

      Version-Release number of selected component (if applicable):

      4.14.0-0.ci-2023-08-11-000617
       

      How reproducible:

      Not everytime but does happen
       

      Steps to Reproduce:

      1. Create a guaranteed(gu) pod with cpu-load-balancing.crio.io: "disable" annotation
      2. After the pods is running state, check the cpus alloted to the pod
      3. Check /proc/schedstat and verify the cpus used by gu pod are not part of scheduling domains
      4. Delete the pod
      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
      

      Actual results:

      5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset. This doesn't happen.
       

      Expected results:

      5. Verify /proc/schedstat is updated i.e cpus used by gu pod should be part of scheduling domains
      6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.
       

      Additional info:

       

              fromani@redhat.com Francesco Romani
              mniranja Mallapadi Niranjan
              None
              None
              Mallapadi Niranjan Mallapadi Niranjan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: