-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Optimize Minor Version Upgrade Duration
-
False
-
False
-
To Do
-
OCPPLAN-5484 - OpenShift 4 EUS to EUS upgrades
-
OCPPLAN-5484OpenShift 4 EUS to EUS upgrades
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- Record accurate expected durations for each minor upgrade step exclusive of Worker MCP rollout
- Identify areas for improvement and measure those improvements
Why is this important?
- We've ignored upgrade duration for the last several releases to the point that we've had to relax existing CI tests which validated that upgrades complete within a certain time period
- When customers prepare for an EUS 4.6 to a 4.10 upgrade it's important that it's as fast as possible and that we can predict for them how long the upgrade should take
- We know there are portions of the upgrade which are slower than necessary including
- DaemonSets which roll out serially when a canary and more parallel manner is safe to do so
- Graceful shutdowns now releasing leases so that new controllers can immediate obtain a lease (MCO, others?)
- Fatter images?
- Higher cross operator parallelization?
- Nothing should be off the table as long as it makes upgrades faster while introducing no additional risk
Scenarios
- ...
Acceptance Criteria
- CI - Need a periodic job which tracks the upgrade waterfall of a cluster of moderate size with moderate workload so that we can track this at a macro level and identify when we regress
- Stories on the boards for components which require optimization
- A report of common issues for which we should amend design patterns to identify and prevent introduction of slack
- A report which measures our success "ie: On a cluster with 12 nodes operating with 800 pods, we've been able to reduce the upgrade time by 7 minutes in each minor version upgrade across 4.6, 4.7, 4.8, 4.9, and 4.10"
Dependencies (internal and external)
- To be populated as soon as we identify any areas for improvements
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>