-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Enforce PDB unhealthyEvictionPolicy in OpenShift
-
False
-
None
-
False
-
Not Selected
-
To Do
All PDBs in OpenShift should consider using AlwaysAllow in .spec.unhealthyEvictionPolicy.
It allows eviction of unhealthy (not ready) pods even if there are no disruptions allowed on a PodDisruptionBudget. This can help to drain/maintain a node and recover without a manual intervention when multiple instances of nodes or pods are misbehaving. Use this with caution, as this option can disrupt perspective pods that have not yet had a chance to become healthy.
Example PRs:
- https://github.com/openshift/cluster-kube-apiserver-operator/pull/1579/commits/c331df37b077e26fd163cdd9dab895fac989fe80
- https://github.com/openshift/library-go/pull/1614
The default or IfHealthyBudget policy should be used only in special cases where the operand availability is critical. For example etcd: https://github.com/openshift/cluster-etcd-operator/pull/1171/commits/647af2f5002a4f6c5846e885eb2643916394a21e
EDIT: etcd in OCP should be fine with AlwaysAllow, but possibly problematic in hypershift? https://redhat-internal.slack.com/archives/CKJR6200N/p1724775333783439
This policy achieves the least amount of disruption, as it does not allow eviction when multiple etcd pods do not report readiness. This can block node drain/maintenance. The cluster administrator should then analyze these pods and decide which one to bring down manually.
- This should be communicated to all PDB owners.
- Usage of AlwaysAllow policy should be enforced by a test. There should be only a handful of exceptions (e.g. etcd)
Further reading:
- is related to
-
IR-486 Set image-registry pod disruption budget .spec.unhealthyPodEvictionPolicy to AlwaysAllow
- To Do
- relates to
-
RFE-6211 Allow evicting unhealthy hosted control-plane pods
- Backlog
-
RFE-1367 Forcefully remove unhealthy pods controlled by PDB during Machine Update
- Deferred
-
OCPBUGS-23796 not possible to drain a master node after multiple master nodes experience network disruption
- Closed