Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2514

Support for Day-2 Control Plane Node Replacement Procedure

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview

      This feature delivers a fully supported, validated, and officially documented procedure for cluster administrators to safely and reliably replace a single failed control plane (master) node on an OpenShift Container Platform (OCP) cluster). This standardized procedure moves a critical operational task from a community/support-documented workaround to an official product capability, significantly improving cluster maintainability and operator confidence.

      Goals

       

      • Key Goal: Provide a fully supported procedure for day-2 control plane node replacement that ensures cluster stability and maintainability.
      • Primary User: The target user is the Cluster Administrator responsible for the Day-1 and Day-2 lifecycle management of the OpenShift cluster.
      • Existing Functionality Improvement: The existing recommended procedure (currently a Red Hat Support Solution) is transformed into a fully supported and officially documented part of the OpenShift documentation, making it authoritative and easier to discover.

      Requirements

      A list of specific needs or objectives that this feature must deliver in order to be considered complete.

      Functional requirements:

      • The procedure must be capable of replacing a single failed control plane node while the cluster remains operational (quorum must be maintained by the remaining control plane nodes).
      • The replacement procedure must be validated for clusters installed via the IPI, UPI, Assisted Installer, or Agent-based Installer (ABI).
      • The final, validated procedure must be published in the official OpenShift documentation (e.g., the Installing an on-premise cluster with the agent-based installer documentation).
      • The procedure must be tested with OpenShift Y-releases, specifically validated against OpenShift 4.19+.

      Non-Functional requirements:

      • Usability/Clarity: The final documented procedure must be clear, step-by-step, and executable by a Cluster Administrator with intermediate OpenShift operational experience.
      • Reliability: The procedure must consistently restore the control plane to a healthy, three-node state (or four/five-node, if applicable) without any residual impact on cluster services or configuration.

      Use Case

      As a Cluster Administrator, I want to use an officially supported procedure for Day-2 control plane node replacement when one control plane node is down for any reason and I need to create a new one, so that I can quickly and confidently restore the control plane's high availability and operational integrity using a reliable, documented method.

      Out of Scope

      The following items are explicitly not included in the scope of this feature:

      • Automation: The procedure remains a manual, operator-driven process. Full or partial automation (e.g., using an Operator or specialized tooling) is out of scope.
      • Complex Failure Modes: The focus is on replacing a single, non-recoverable failed node, not on handling different, complex failure modes (e.g., network partitions, or simultaneous multiple node failures).

      Links

              mzasepa Michal Zasepa
              mzasepa Michal Zasepa
              None
              None
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: