Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18
Component/s: Node / CPU manager
Labels:
- Regression
- Telco
- triaged
- trincr

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No
Latest Status Summary:

Hide
2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions.
2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely.
2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!

Show
2024/11/19: will be fixed in 4.19, and we will document the workaround for the previous versions. 2024/11/04: the upstream fix was merged in kubernetes 1.32. Backport is unlikely. 2024/10/22: the one and only prerequisite PR to merge the upstream PR which will fix this issue is at last being reviewed!

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

Internal Whiteboard:
RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Known Issue
Release Note Text:

Hide
* Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool.
+
This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link:https://issues.redhat.com/browse/OCPBUGS-46428[*~~OCPBUGS-46428~~*])

Show
* Due to an issue with Kubernetes, the CPU Manager is unable to return CPU resources from the last pod admitted to a node to the pool of available CPU resources. These resources are allocatable if a subsequent pod is admitted to the node. However, this in turn becomes the last pod, and again, the CPU manager cannot return this pod's resources to the available pool. + This issue affects CPU load balancing features because these features depend on the CPU Manager releasing CPUs to the available pool. Consequently, non-guaranteed pods might run with a reduced number of CPUs. As a workaround, schedule a pod with a `best-effort` CPU Manager policy on the affected node. This pod will be the last admitted pod and this ensures the resources will be correctly released to the available pool.(link: https://issues.redhat.com/browse/OCPBUGS-46428 [* OCPBUGS-46428 *])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

Current when the Guaranteed pods is started with cpu load balancing is disabled the cpus allocated are removed from joining any scheduling domains. but after the pod is deleted, the cpus do not go back to it's original configuration

Version-Release number of selected component (if applicable):

4.14.0-0.ci-2023-08-11-000617

How reproducible:

Not everytime but does happen

Steps to Reproduce:

1. Create a guaranteed(gu) pod with cpu-load-balancing.crio.io: "disable" annotation
2. After the pods is running state, check the cpus alloted to the pod
3. Check /proc/schedstat and verify the cpus used by gu pod are not part of scheduling domains
4. Delete the pod
5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.

Actual results:

5. Verify /proc/schedstat is updated (even after waiting for long time it doesn't get updated , i.e cpus used by gu pod should be part of scheduling domains
6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset. This doesn't happen.

Expected results:

5. Verify /proc/schedstat is updated i.e cpus used by gu pod should be part of scheduling domains
6. Verify any burstable pod and check it's cpuset.cpus , they should have the cpus that were used by gu pod, added back to it's cpuset.

Additional info:

clones

OCPBUGS-17792 Cpu used by pods with cpu load balancing disabled annotation when deleted are not part of scheduling domains

Closed

Assignee:: Francesco Romani

Reporter:: Niranjan Mallapadi Raghavendra Rao

Need Info From:: None

Contributors:: None

QA Contact:: Aditi Sahay

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/12/15 5:13 PM

Updated:: 2025/10/02 10:54 PM

Resolved:: 2025/05/15 1:14 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates