Add the ability to choose between a full cluster upgrade (which exists today) or control plane upgrade (which will pause all worker pools) in the console.
Currently in the console, users only have the ability to complete a full cluster upgrade. For many customers, upgrades take longer than what their maintenance window allows. Users need the ability to upgrade the control plane independently of the other worker nodes.
Ex. Upgrades of huge clusters may take too long so admins may do the control plane this weekend, worker-pool-A next weekend, worker-pool-B the weekend after, etc. It is all at a pool level, they will not be able to choose specific hosts.
- Changes to the Update modal:
- Add the ability to choose between a cluster upgrade and a control plane upgrade (the design does not default to a selection but rather disables the update button to force the user to make a conscious decision)
- link out to documentation to learn more about update strategies
- Changes to the in progress check list:
- Add a status above the worker pool section to let users know that all worker pools are paused and an action to resume all updates
- Add a "resume update" button for each worker pool entry
- Changes to the update status:
- When all master pools are updated successfully, change the status from what we have today "Up to date" to something like "Control plane up to date - all worker pools paused"
- Add an inline alert that lets users know there is a 60 day window to update all worker pools. In the alert, include the sentiment that worker pools can remain paused as long as is normally safe, which means until certificate rotation becomes critical which is at about 60 days. The admin would be advised to unpause them in order to complete the full upgrade. If the MCPs are paused, the certification rotation does not happen, which causes the cluster to become degraded and causes failure in multiple 'oc' commands, including but not limited to 'oc debug', 'oc logs', 'oc exec' and 'oc attach'. (Are we missing anything else here?) Inline alert logic:
- From day 60 to day 10 use the default alert.
- From day 10 to day 3 use the warning alert.
- From day 3 to 0 use the critical alert and continue to persist until resolved.