Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.12, 4.13, 4.14, 4.11
Labels:
None

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
CNTRLPLANE-1319
Story Points:
5

Target Version:
None
Release Blocker:
None
Sprint:
None

We need to write an article about the resource consuption. Mentioning the growing memory consumption of the kube-scheduler. Which is inherently present in every secondary scheduler using the scheduling framework. And the descheduler. We need some reliable measurements. We can ask the performance team whether they have data about the kube-scheduler they can share with us. If they don't we can ask them to measure the memory consumption of the kube-scheduler wrt. number of pods (10k, 20k, 30k, ...). Resp. we can create our own cluster with many nodes and check the memory-consumption while increasing the number of pods. https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html/[…]mance/planning-your-environment-according-to-object-maximums claims a single node can run up to 500 nodes. 10k goes with 20 nodes. A pod can be running a very simple command eating almost no resources. We can go 10k, 20k, 30k, 40k, 50k pods with 20, 40, 60, 80, 100 nodes. Once we have the data we can justify why we need to omit the memory resource limit. We can measure the cpu consumption as well just to be sure. Repeating the same measurement for the descheduler as well.

Acceptance criteria:

memory/cpu resource consumption of kube-scheduler/descheduler wrt. the number of pods in a cluster (multiples of 10k pods), (e.g. for various OCP versions).
article (e.g. KCS) mentioning the resource consumption with some graphs
justification for why a scheduler/descheduler memory resource limits (or even cpu) can not be bounded for cluster with huge number of pods. Possibly removing the resource limits. Or, making the resource limits configurable.
discussion with upstream communities whether there's a way to reduce the memory consumption. Probably impossible to have less than a linear consumption wrt. the number of pods.

is related to

OCPBUGS-1995 Descheduler pod is OOM killed when using descheduler-operator profiles on big clusters

Closed

Assignee:: Unassigned

Reporter:: Jan Chaloupka

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2023/04/28 3:30 PM

Updated:: 2025/06/27 8:28 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates