Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7912

[HCP]Enable Rebootless Pull Secret Updates in HostedClusters

XMLWordPrintable

    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

       

      1. Proposed title of this feature request
      Enable Rebootless Pull Secret Updates in HostedClusters

      2. What is the nature and description of the request?

      Currently, updating the pull secret for a HostedCluster by modifying the hostedcluster.spec.pullSecret.name field requires a full node reboot or replacement. The exact behavior is determined by the nodepool.spec.management.upgradeType setting (InPlace triggers a reboot, Replace triggers a new node rollout).

      This process is disruptive and inconsistent with standard OpenShift Container Platform (OCP) behavior, where the global pull secret can be updated without rebooting nodes. The current implementation for HostedClusters leads to significant operational challenges, especially in large-scale environments.

      I propose that updating the pull secret specified in hostedcluster.spec.pullSecret.name should be a rebootless operation. The change should be applied to all nodes in the associated NodePools without requiring them to be drained, rebooted, or replaced. This would align the functionality of HostedClusters with that of standard OCP clusters.

      3. Why does the customer need this? (List the business requirements here)

      Implementing a rebootless pull secret update offers several key advantages:

      Ensure High Availability: A rebootless update prevents nodes from being drained and pods from being evicted. This eliminates workload disruption and ensures service continuity, which is critical for production environments.

      Improve Consistency with OCP: Core functionalities should be consistent across OCP and HostedCluster deployments. Aligning this behavior will lower the learning curve for customers adopting HostedClusters and create a more seamless user experience.

      Reduce Support Workload: This change would prevent a significant number of support cases that arise from nodes getting stuck in a Draining state during reboots or from failures during the provisioning of new nodes in a Replace scenario. This reduces the burden on support engineering teams.

      Enhance Operational Efficiency: By removing the need for reboots, customers no longer need to schedule dedicated maintenance windows for a simple credential update. This provides greater operational flexibility and reduces administrative overhead.

      Use Case Scenario:

      Consider a HostedCluster with 100+ worker nodes. When the pull secret needs to be updated:

      With upgradeType: InPlace, all 100+ nodes must be sequentially drained and rebooted. This is an extremely time-consuming process that poses a significant risk of nodes failing to come back online, requiring manual intervention.

      With upgradeType: Replace, the customer must provision an additional 100+ worker nodes, which can be a major challenge due to resource constraints, quota limitations, and infrastructure costs.

      A rebootless update mechanism would mitigate these challenges entirely, making the process quick, safe, and efficient regardless of cluster size.

      4. List any affected packages or components.
      Hypershift

              racedoro@redhat.com Ramon Acedo
              rhn-support-dpateriy Divyam Pateriya
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                None
                None