-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Sequential Node Replacement Across Multiple NodePools in HCP clusters
2. What is the nature and description of the request?
Currently, HCP cluster upgrades or reboots nodes per NodePool, with one node replaced at a time within each NodePool.
However, multiple NodePools are upgraded in parallel, which can result in more than one node being unavailable across the cluster simultaneously.
The request is to provide a cluster-level option to serialize node replacements across all NodePools, so that at any time only one node in the entire cluster is being upgraded or rebooted, regardless of how many NodePools exist. This behavior should apply to both automated upgrades and manual maintenance operations.
3. Why does the customer need this? (Business requirements)
- Prevent temporary workload outages
-
- Workloads with low replica counts and topologySpreadConstraints or PodDisruptionBudgets can fail if nodes across multiple pools are rebooted simultaneously.
- Ensure high availability during upgrades
-
- Critical applications require at least one pod available at all times. Parallel node reboots across pools can violate this requirement.
- Support enterprise operational policies
-
- Some organizations require strict control over maintenance activities to meet internal SLA or compliance requirements.
- Reduce operational risk
-
- Minimizes the chance of simultaneous node loss across NodePools during upgrades, patching, or emergency maintenance.
4. List any affected packages or components
HCP NodePools
Cluster Machine Management / Hypershift NodePool Operator
Rolling upgrade mechanism (management.replace.rollingUpdate)
Worker node replacement workflows