Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8290

Support for deterministic worker node deletion (scale-down) in Hosted Control Plane (HCP) clusters

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Hosted Control Planes, ROSA
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Support for deterministic worker node deletion (scale-down) in Hosted Control Plane (HCP) clusters

      2. What is the nature and description of the request?

      The request is to implement a supported, deterministic mechanism within Hosted Control Plane (HCP) clusters that allows a user to specify which individual worker node should be removed when scaling down a Machine Pool.

      Currently, scaling down a Machine Pool (reducing the replica count) results in the random removal of a worker node, as  HCP does not expose the underlying OpenShift Machine resources that enable targeted deletion via annotations in ROSA Classic/self-managed OpenShift.

      The desired solution should be a reliable and graceful operation, that performs the following steps on the backend:

      1. Accepts a target node name for deletion.
      1. Cordon and drain the specified node, respecting Pod Disruption Budgets (PDBs).
      1. Delete the node's underlying VM
      1. Reduce the desired replica count of the associated Machine Pool by one, ensuring the cluster remains scaled down and does not attempt to provision a replacement node.

      Attempting to achieve this manually (cordon, drain, delete EC2, and then scale-down the Machine Pool) has been shown to be unreliable and non-deterministic (e.g., stuck nodes waiting for manual CSR approval, new machines still getting provisioned).

       

      3. Why does the customer need this? (List the business requirements here)

      • The ability to remove a specific node that was added for testing purposes. 
      • Avoid the risks associated with the current non-deterministic method, which could randomly delete a node hosting critical, difficult-to-move, or stateful workloads, or workloads that only have 1 replica. 

       

      4. List any affected packages or components.

      • HCP

       

       

              racedoro@redhat.com Ramon Acedo
              rh-ee-dcoronel David Coronel
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                None
                None