Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: openshift-4.20
Component/s: Hosted Control Planes
Labels:
- cee.next_proposed
- nodepool

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Apply rebootless configuration changes in-place for NodePools with 'Replace' upgrade strategy

2. What is the nature and description of the request?

This is a request to change the behavior of HyperShift NodePools that are configured with the upgradeStrategy: Replace.

Current Behavior: When any change is made to the NodePool or HostedCluster specification (e.g., adding an additionalTrustBundle), NodePools using the Replace strategy will trigger a full, rolling replacement of every worker node. This involves draining, terminating, and recreating each machine, even if the change itself does not require a reboot (e.g., updating the node's CA trust store).

Desired Behavior: The HyperShift control plane should be intelligent enough to differentiate between changes that require a node replacement (like an OS image change or Kubelet version upgrade) and "rebootless" configuration changes (like additionalTrustBundle, labels, or taints).

If a change is identified as "rebootless," it should be applied in-place to the existing worker nodes, bypassing the Replace strategy's drain/recreate cycle. The Replace strategy should only be invoked for changes that fundamentally cannot be applied to a running node.

3. Why does the customer need this? (List the business requirements here)
The current behavior creates significant operational inefficiencies and instability for large clusters.

Reduce Operational Cost and Time: For large clusters (e.g., HCP on KubeVirt with 100+ worker nodes), replacing every node for a minor change like adding a CA certificate is extremely time-consuming and resource-intensive. A change that takes moments on an Inplace cluster can take hours on a Replace cluster.

Improve Cluster Availability and Stability: A full node replacement cycle involves mass pod drains and rescheduling. This causes unnecessary application disruption and potential downtime for a change that should be non-disruptive.

Enable Operational Agility: Administrators are hesitant to apply simple, necessary configuration updates (like trusting a new internal CA) because they know it will trigger a massive, disruptive cluster event. This change would allow them to perform such tasks quickly and safely.

Provide Logical Consistency: The upgradeStrategy should primarily govern upgrades. It should not force a disruptive replacement for a simple configuration update that the system is clearly capable of handling in-place (as demonstrated by the Inplace strategy).

4. List any affected packages or components.
Hosted Control Plane

Assignee:: Ramon Acedo

Reporter:: Divyam Pateriya

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/11/13 4:38 PM

Updated:: 2025/11/13 7:08 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates