Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: Node
Labels:
- kube-controller-manager
- node

Blocked:
False
Blocked Reason:
None
Ready:
False
Color Status:
Not Selected
Intelligence Requested:
Market:
PX Impact Range:
PX Impact Score:
PX Priority Data:
PX Review Complete:
PX Scheduling Request:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request

Add per-node terminated pod eviction threshold to the pod garbage collector

2. What is the nature and description of the request?

We are requesting to add some kube-controller manager flag similar to terminated-pod-gc-threshold but "per-node", i.e. that deletes terminated pods on a concrete node if they grow higher than the threshold. This flag should work in addition to the already existing terminated-pod-gc-threshold.

Recommended default threshold should be on the order of max pods, no more than double (reason for it explained below).

3. Why does the customer need this? (List the business requirements here)

If the number of exited containers goes above some threshold, kubelets end up KO because of the too big gRPC response returned via CRI socket.

The global terminated-pod-gc-threshold cluster-wide limit is not enough, because we can have less pods than the threshold cluster-wide and yet have enough on a concrete node to overwhelm it. This is why we need a "per-node" one.

Increase the max gRPC size is also not an option: this was done in the past, but we cannot increase it forever.

In addition:

We cannot rely on - - maximum-dead-containers and/or - - maximum-dead-containers-per-container in the long term because those are deprecated (there is no clear date to remove them in a future version as of now, because it depends on some goals that have not been prioritized yet upstream, but they would eventually be cleaned up one day).
We cannot rely on current container eviction mechanisms because those work only based on storage consumption thresholds, while the gRPC overload happens due to just the number of containers (create enough pods with small enough containers and storage-consumption-based eviction can be avoided).
We cannot rely on max-pods because those don't include terminated pods.
We cannot rely on users good will and/or being careful, because we need to protect against intentional DOS.

Last but not least: The reason to suggest the default threshold to be in the same order than max-pods is because the goal of this threshold is to protect the individual kubelets, not to protect the kube-apiserver from an excessive number of API pod objects.

4. List any affected packages or components.

kube-controller-manager
cluster-kube-apiserver-operator

relates to

RFE-4898 Add resourcequota scope for terminated pods

Rejected

Assignee:: Gaurav Singh

Reporter:: Pablo Alonso Rodriguez

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2023/11/14 4:30 PM

Updated:: 2024/08/13 4:09 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates