Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8927

Sequential Node Reboot Across Multiple NodePools in HCP clusters

XMLWordPrintable

    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Sequential Node Replacement Across Multiple NodePools in HCP clusters

      2. What is the nature and description of the request?

      Currently, HCP cluster upgrades or reboots nodes per NodePool, with one node replaced at a time within each NodePool.

      However, multiple NodePools are upgraded in parallel, which can result in more than one node being unavailable across the cluster simultaneously.

      The request is to provide a cluster-level option to serialize node replacements across all NodePools, so that at any time only one node in the entire cluster is being upgraded or rebooted, regardless of how many NodePools exist. This behavior should apply to both automated upgrades and manual maintenance operations.

      3. Why does the customer need this? (Business requirements)

      1. Prevent temporary workload outages
        • Workloads with low replica counts and topologySpreadConstraints or PodDisruptionBudgets can fail if nodes across multiple pools are rebooted simultaneously.
      1. Ensure high availability during upgrades
        • Critical applications require at least one pod available at all times. Parallel node reboots across pools can violate this requirement.
      1. Support enterprise operational policies
        • Some organizations require strict control over maintenance activities to meet internal SLA or compliance requirements.
      1. Reduce operational risk
        • Minimizes the chance of simultaneous node loss across NodePools during upgrades, patching, or emergency maintenance.

      4. List any affected packages or components

      HCP NodePools

      Cluster Machine Management / Hypershift NodePool Operator

      Rolling upgrade mechanism (management.replace.rollingUpdate)

      Worker node replacement workflows

              gausingh@redhat.com Gaurav Singh
              rhn-support-sdharma Suruchi Dharma
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                None
                None