Goal

We document Canary Rollouts in documentation today. See https://docs.openshift.com/container-platform/4.10/updating/update-using-custom-machine-config-pools.html.
This rollout mechanism can also speed up upgrade times. And for that, one needs to tweak the documentation to include the following:
- Split workers into node pools

- - Rule of thumb for how many nodes in a pool = % [Spare capacity percent on overall total capacity]* Number of nodes in the cluster;
  - Needs to be adjusted downwards.
- Set maxUnavailable=Number of workers in the node pool
- Pause all Node Pools
- Unpause the target node pool.
- Repeat until all node pools are upgraded.

Why is this important?

Significant improvements in upgrade times are possible. Also number of pod restarts are reduced.

For example, a 10 node pool on 100 nodes, means that 10 nodes will only take 10 minutes each and therefore overall time now is reduced to 100 minutes. Significant improvements.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

https://docs.openshift.com/container-platform/4.10/updating/update-using-custom-machine-config-pools.html.

Open questions::

How does PDBs impact this?
Can the number of pod restarts be improved, also because of the new scheduler functionality that choses pod evictions to land on previously upgraded worker pools? Should we mention that in documentation?

Assignee:: Unassigned

Reporter:: Tushar Katarki

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/06/14 3:35 PM

Updated:: 2024/11/26 2:54 PM

Details

Description

Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Attachments

Easy Agile Planning Poker

Activity

People

Dates