-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
4.12, 4.13, 4.14, 4.11
-
None
-
Upstream
-
5
-
False
-
None
-
False
-
OCPSTRAT-46 - Strategic Upstream Work - OCP Control Plane and Node Lifecycle Group
-
-
We need to write an article about the resource consuption. Mentioning the growing memory consumption of the kube-scheduler. Which is inherently present in every secondary scheduler using the scheduling framework. And the descheduler. We need some reliable measurements. We can ask the performance team whether they have data about the kube-scheduler they can share with us. If they don't we can ask them to measure the memory consumption of the kube-scheduler wrt. number of pods (10k, 20k, 30k, ...). Resp. we can create our own cluster with many nodes and check the memory-consumption while increasing the number of pods. https://access.redhat.com/documentation/en-us/openshift_container_platform/4.12/html/[…]mance/planning-your-environment-according-to-object-maximums claims a single node can run up to 500 nodes. 10k goes with 20 nodes. A pod can be running a very simple command eating almost no resources. We can go 10k, 20k, 30k, 40k, 50k pods with 20, 40, 60, 80, 100 nodes. Once we have the data we can justify why we need to omit the memory resource limit. We can measure the cpu consumption as well just to be sure. Repeating the same measurement for the descheduler as well.
Acceptance criteria:
- memory/cpu resource consumption of kube-scheduler/descheduler wrt. the number of pods in a cluster (multiples of 10k pods), (e.g. for various OCP versions).
- article (e.g. KCS) mentioning the resource consumption with some graphs
- justification for why a scheduler/descheduler memory resource limits (or even cpu) can not be bounded for cluster with huge number of pods. Possibly removing the resource limits. Or, making the resource limits configurable.
- discussion with upstream communities whether there's a way to reduce the memory consumption. Probably impossible to have less than a linear consumption wrt. the number of pods.
- is related to
-
OCPBUGS-1995 Descheduler pod is OOM killed when using descheduler-operator profiles on big clusters
- Closed