-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Future Sustainability
-
None
-
False
-
-
None
-
None
-
None
-
-
-
-
None
-
None
-
None
Proposed title of this feature request:
Make terminated-pod-gc-threshold a Supported Configuration in OpenShift
What is the nature and description of the request?
Please support terminated-pod-gc-threshold as a first-class, upgrade-safe configuration in OpenShift.
This will let users tune pod garbage collection to prevent resource exhaustion and maintain cluster health, especially during node shutdown events. Refer https://access.redhat.com/solutions/6996490
Why does the customer need this?
The pod network-node-identity has broad toleration (matches all), so that when a node is doing graceful shutdown, this pod keeps getting scheduled on the shutting down node, and because the node correctly keeps rejecting it, these failed pods are accumulating with `ContainerStatusUnknown` and are not garbage collected soon enough that some services are OOM-ing.
List any affected packages or components.
What is the business impact?
We encountered an outage due to accumulated pods in `ContainerStatusUnknown` that aren't garbage collected soon enough. And we had to disable graceful shutdown to mitigate this, but then we lose the graceful shutdown feature which makes our shutdown take longer that it needs to be.