Uploaded image for project: 'Container / Cluster Management (XCM) Strategy'
  1. Container / Cluster Management (XCM) Strategy
  2. XCMSTRAT-187

Upgrade configurability: Allow customers to set maxUnavailable for MCP

XMLWordPrintable

    • False
    • Not Selected
    • 0
    • 0% 0%
    • Hide

      This feature continues to be in backlog due to other higher priority features. If your customer needs this feature, please link the case or opportunity to help reprioritization. 

      Show
      This feature continues to be in backlog due to other higher priority features. If your customer needs this feature, please link the case or opportunity to help reprioritization. 
    • 0

      Feature Overview (aka. Goal Summary)  

      This feature will introduce customization for cluster update/upgrade strategy by introducing a new configuration that will allow multiple cluster nodes to be upgraded in parallel and helping to reduce the overall upgrade time. 

      Goals (aka. expected user outcomes)

      Customers can pass a non-zero value for Machine Config Pool parameter maxUnavailable and maxSurge at a cluster level that will be used during the cluster upgrade to upgrade as many nodes in parallel. This will allow parity between self-managed OCP and ROSA/OSD clusters. 

      Requirements (aka. Acceptance Criteria):

      Both maxUnavailable and maxSurge are

      1. Configurability at the machine pool level
      2. Applicable only to the machine pools or the worker nodes that customer create/manage
      3. Allow shorter range of values -  (1,3) to begin with
      4. Default is 1 (no change to defaults)
      5. OCM UI, CAPA/CAPI, ROSA CLI, Terraform supports configuring this field.
        1. With Terraform this would be a parameter of the ROSA cluster resource
      6. Documentation will need an update in the upgrade section about the parameter, what it does and why it may be useful.

      Use Cases (Optional):

      1. Cluster administrators take planned maintenance window with the businesses so they'd like to shorten the window as much as possible within the limits of safety of the cluster but availability of the services is not a constraint.
      2. The workloads have restrictive PDBs (maxunavailable=0%) so safely draining one node at a time delays if not fails the upgrade. A maintenance window is picked on the clusters when these workloads don't run and the window ought to finish before the workloads begin. 
      3. Administrators have self-managed OCP clusters using this capability and following same operations across different environments is preferred for migration of workloads to managed cloud services.

       

            rh-ee-bchandra Balachandran Chandrasekaran
            rh-ee-bchandra Balachandran Chandrasekaran
            Zhe Wang Zhe Wang
            Votes:
            2 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: