-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
Goal
- We document Canary Rollouts in documentation today. See https://docs.openshift.com/container-platform/4.10/updating/update-using-custom-machine-config-pools.html.
- This rollout mechanism can also speed up upgrade times. And for that, one needs to tweak the documentation to include the following:
- Split workers into node pools
-
-
- Rule of thumb for how many nodes in a pool = % [Spare capacity percent on overall total capacity]* Number of nodes in the cluster;
- Needs to be adjusted downwards.
- Set maxUnavailable=Number of workers in the node pool
- Pause all Node Pools
- Unpause the target node pool.
- Repeat until all node pools are upgraded.
-
Why is this important?
Significant improvements in upgrade times are possible. Also number of pod restarts are reduced.
For example, a 10 node pool on 100 nodes, means that 10 nodes will only take 10 minutes each and therefore overall time now is reduced to 100 minutes. Significant improvements.
Scenarios
- ...
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
Open questions::
- How does PDBs impact this?
- Can the number of pod restarts be improved, also because of the new scheduler functionality that choses pod evictions to land on previously upgraded worker pools? Should we mention that in documentation?