Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Hosted Control Planes
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Remove PDBs when the kube-apiserver is fully down

2. What is the nature and description of the request?

When the kube-apiserver pods of a HCP are fully down, the existing PDBs in a HA setup prevent draining the HCP pods. This is undesirable, as the PDBs at that point are no longer effectively protecting the workloads from disruption.

This happens frequently in ROSA HCP and ARO HCP. A well known issue is the following:

The HCP has etcd encryption enabled
The persona utilizing the HCP broke the access to the encrypt/decrypt calls (e.g. broke KMS key permissions, deleted the key, deleted the OIDC)
The kube-apiserver can no longer encrypt/decrypt etcd and is stuck in a 6/7 container state, with the kube-apiserver container not starting up

We have explored the unhealthyPodEvictionPolicy property in the past as part of RFE-6211, but it won't allow draining the control plane pods on its own. The apiserver changing from an available to an unavailable state might have allowed previous secondary pods (e.g. openshift-apiserver) to start in the past. However, following a drain and a re-start on a new node, these pods may no longer start as they cannot reach the kube-api in the initialization phase.

The way ROSA HCP solves this is by having a custom job that deletes all pods of a draining node if the apiserver of the parent HCP are fully down.

The generic proposal: remove PDBs of a HCP when the kube-apiserver of a HCP is fully down.

3. Why does the customer need this?

Facilitate draining nodes on the management cluster, without increased risk for the availability of hosted services.

This allows faster node drains without hands-on operations to get things unstuck.

4. List any affected packages or components.

CPO

Previous RFEs that are looking to achieve the same goal:

https://issues.redhat.com/browse/RFE-6211

https://issues.redhat.com/browse/RFE-8779

I believe we can close those once this is accepted.

Assignee:: Ramon Acedo

Reporter:: Claudio Busse

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/11 5:09 PM

Updated:: 2026/02/13 10:08 AM

Resolved:: 2026/02/13 10:07 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates