Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-455

Support CEE assisted z-stream rollbacks



    • Epic
    • Resolution: Unresolved
    • Normal
    • None
    • None
    • None
    • Support CEE assisted z-stream rollbacks
    • False
    • False
    • OCPSTRAT-835Improve upgrades - phase 2
    • To Do
    • OCPSTRAT-835 - Improve upgrades - phase 2
    • 40
    • 40% 40%
    • Undefined


      Epic Goal

      • Validate z-stream rollbacks in CI starting with 4.10 by ensuring that a rollback completes unassisted and e2e testsuite passes
      • Provide internal documentation (private KCS article) that explains when this is the best course of action versus working around a specific issue
      • Provide internal documentation (private KCS article) that explains the expected cluster degradation until the rollback is complete
      • Provide internal documentation (private KCS article) outlining the process and any post rollback validation

      Why is this important?

      • Even if upgrade success is 100% there's some chance that we've introduced a change which is incompatible with a customer's needs and they desire to roll back to the previous z-stream
      • Previously we've relied on backup and restore here, however due to many problems with time travel, that's only appropriate for disaster recovery scenarios where the cluster is either completely shut down already or it's acceptable to do so while also accepting loss of any workload state change (PVs that were attached after the backup was taken, etc)
      • We believe that we can reasonably roll back to a previous z-stream


      1. Upgrade from 4.10.z to 4.10.z+n
      2. oc adm upgrade rollback-z-stream – which will initially be hidden command, this will look at clusterversion history and rollback to the previous version if and only if that version is a z-stream away
      3. Rollback from 4.10.z+n to exactly 4.10.z, during which the cluster may experience degraded service and/or periods of service unavailability but must eventually complete with no further admin action
      4. Must pass 4.10.z e2e testsuite

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      1. At least today we indend to only surface this process internally and work through it with customers actively engaged with support, where do we put that?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>


        Docs Tracker Sub-task Closed Undefined Unassigned
        QE Tracker Sub-task Closed Undefined Unassigned
        TE Tracker Sub-task Closed Undefined Unassigned



            lmohanty@redhat.com Lalatendu Mohanty
            rhn-support-sdodson Scott Dodson
            5 Vote for this issue
            38 Start watching this issue