-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
Not Selected
-
0
-
0%
-
-
0
Feature Overview (aka. Goal Summary)
This feature will introduce customization for cluster update/upgrade strategy by introducing a new configuration that will allow multiple cluster nodes to be upgraded in parallel and helping to reduce the overall upgrade time.
Goals (aka. expected user outcomes)
Customers can pass a non-zero value for Machine Config Pool parameter maxUnavailable and maxSurge at a cluster level that will be used during the cluster upgrade to upgrade as many nodes in parallel. This will allow parity between self-managed OCP and ROSA/OSD clusters.
Requirements (aka. Acceptance Criteria):
Both maxUnavailable and maxSurge are
- Configurability at the machine pool level
- Applicable only to the machine pools or the worker nodes that customer create/manage
- Allow shorter range of values - (1,3) to begin with
- Default is 1 (no change to defaults)
- OCM UI, CAPA/CAPI, ROSA CLI, Terraform supports configuring this field.
- With Terraform this would be a parameter of the ROSA cluster resource
- Documentation will need an update in the upgrade section about the parameter, what it does and why it may be useful.
Use Cases (Optional):
- Cluster administrators take planned maintenance window with the businesses so they'd like to shorten the window as much as possible within the limits of safety of the cluster but availability of the services is not a constraint.
- The workloads have restrictive PDBs (maxunavailable=0%) so safely draining one node at a time delays if not fails the upgrade. A maintenance window is picked on the clusters when these workloads don't run and the window ought to finish before the workloads begin.
- Administrators have self-managed OCP clusters using this capability and following same operations across different environments is preferred for migration of workloads to managed cloud services.