Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: ARO, Hosted Control Planes
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed Title:

Hypershift: Automatic PDB bypass mechanism for unhealthy Hosted Control Planes during management cluster upgrades

2. Nature and Description of the Request:

ARO-HCP (and other managed Hypershift offerings) requires a mechanism to bypass Pod Disruption Budgets (PDBs) for Hosted Control Planes that are in a fundamentally unhealthy state, specifically when those PDBs block management cluster node drain operations during upgrades to infrastructure.

Proposed behavior:

Services (ARO/ROSA/etc) should detect when a Hosted Control Plane is fundamentally unhealthy (e.g., complete kube-apiserver loss, persistent crash loops, unrecoverable state)
When an unhealthy HCP is identified by a Service and a management cluster upgrade/drain is pending as a result, Services need a way to notify Hypershift that it should allow bypassing specific hosted cluster PDBs by some kind of signal they can set on that HCP instance ("force-drain allowed"/etc) that removes the PDB for that HCP instance until the signal is removed.

3. Business Requirements:

Live-service availability: Management clusters must be upgradeable on a predictable schedule to address security vulnerabilities, apply RFEs, and maintain SLAs. A single broken customer control plane cannot be allowed to block upgrades for the entire management cluster.
Security posture: Delayed upgrades due to stuck PDBs extend exposure windows for CVEs affecting management cluster components.
Operational efficiency: SRE teams currently require manual intervention to identify and work around these situations, increasing toil and incident response time.

4. Affected Packages/Components:

hypershift (core operator logic, PDB creation/management)
hypershift/control-plane-operator (health detection, PDB lifecycle)
HostedCluster API (potential new field for bypass authorization)

NOTE:

There is an existing workaround:

`kubectl patch hostedcluster -n "${HCNS}" "${CLUSTER_NAME}" -p '{"spec":{"pausedUntil":"true"}}' --type="merge"`

However - this is not ideal as it stops all reconciliation of anything related to the cluster. We would rather use a finer dial in production.

links to

openshift/hypershift#7647: Support disabling PDBs on hosted clusters

Assignee:: Jerome Boutaud

Reporter:: Brendan Bergen

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/02/04 8:33 PM

Updated:: 2026/02/13 10:08 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates