-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Draft Admin Centered Upgrade Documentation Phase 4
-
BU Product Work
-
False
-
False
-
To Do
-
OCPSTRAT-1064 - Improve upgrades - phase 3 - Control plane & worker node independence
-
Impediment
-
OCPSTRAT-1064Improve upgrades - phase 3 - Control plane & worker node independence
-
63% To Do, 0% In Progress, 38% Done
-
Undefined
OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- Revamp our Upgrade Documentation to include an appropriate level of detail for admins
Why is this important?
- Currently Admins have nothing which explains to them how upgrades actually work and as a result when things don't go perfectly they panic
- We do not sufficiently, or at least within context of Upgrade Docs, explain the differences between Degraded and Available statuses
- We do not explain order of operations
- We do not explain protections built into the platform which protect against total cluster failure, ie halting when components do not return to healthy state within exp
Scenarios
- Move out channel management to its own chapter
- Explain or link to existing documentation which addresses the differences between Degraded=True and Available=False
- Explain Upgradeable=False conditions and other aspects of upgrade preflight strategy that Operators should be indicating when its unsafe to upgrade
- Explain basics of how the upgrade is applied
- CVO fetches release image
- CVO updates operators in the following order
- Each operator is expected to monitor for success
- Provide example ordering of manifests and command to extract release specific manifests and infer the ordering
- Explain how operators indicate problems and generic processes for investigating them
- Explain the special role of MCO and MCP mechanisms such as pausing pools
- Provide some basic guidance for Control Plane duration, that is exclude worker pool rollout duration (90-120 minutes is normal)
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- There was an effort to write up how to use MachineConfig Pools to partition and optimize worker rollout in https://issues.redhat.com/browse/OTA-375
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- clones
-
OTA-810 Draft Admin Centered Upgrade Documentation Phase 3
- Closed