-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
Support z-stream rollbacks
-
BU Product Work
-
False
-
False
-
To Do
-
OCPSTRAT-975 - Support assisted z-rollback for OCP EUS versions from 4.16+
-
OCPSTRAT-975Support assisted z-rollback for OCP EUS versions from 4.16+
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
Epic Goal
- Validate z-stream rollbacks in CI starting with 4.16 by ensuring that a rollback completes unassisted and e2e testsuite passes
- Provide internal documentation (private KCS article) that explains when this is the best course of action versus working around a specific issue
- Provide internal documentation (private KCS article) that explains the expected cluster degradation until the rollback is complete
- Provide internal documentation (private KCS article) outlining the process and any post rollback validation
Why is this important?
- Even if upgrade success is 100% there's some chance that we've introduced a change which is incompatible with a customer's needs and they desire to roll back to the previous z-stream
- Previously we've relied on backup and restore here, however due to many problems with time travel, that's only appropriate for disaster recovery scenarios where the cluster is either completely shut down already or it's acceptable to do so while also accepting loss of any workload state change (PVs that were attached after the backup was taken, etc)
- We believe that we can reasonably roll back to a previous z-stream
Scenarios
- Upgrade from 4.16.z to 4.16.z+n
- oc adm upgrade rollback-z-stream – which will initially be hidden command, this will look at clusterversion history and rollback to the previous version if and only if that version is a z-stream away
- Rollback from 4.16.z+n to exactly 4.16.z, during which the cluster may experience degraded service and/or periods of service unavailability but must eventually complete with no further admin action
- Must pass 4.16.z e2e testsuite
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- Fix all bugs listed here
project = "OpenShift Bugs" AND affectedVersion in( 4.16, 4.17) AND labels = rollback AND status not in (Closed ) ORDER BY status DESC
Documentation
KCS : https://access.redhat.com/solutions/7089715
Open questions::
- At least today we indend to only surface this process internally and work through it with customers actively engaged with support, where do we put that?
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- is cloned by
-
OTA-1287 z-stream rollbacks improvements
- Closed
- is related to
-
OTA-941 Require forcing to get the cluster-version operator to accept rollbacks
- Closed
- relates to
-
OTA-1037 Unexpected error "ClusterOperators did not settle" occurs during Rollback
- Closed
-
OCPSTRAT-1499 Provide the ability for rolling back Control plane for hosted control plane
- New
- links to
(6 links to)
There are no Sub-Tasks for this issue.