Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-455

Support assisted z-stream rollbacks from 4.16+

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • Support z-stream rollbacks
    • BU Product Work
    • False
    • False
    • To Do
    • OCPSTRAT-975 - Support assisted z-rollback for OCP EUS versions from 4.16+
    • OCPSTRAT-975Support assisted z-rollback for OCP EUS versions from 4.16+
    • 0% To Do, 0% In Progress, 100% Done
    • Undefined

      Epic Goal

      • Validate z-stream rollbacks in CI starting with 4.16 by ensuring that a rollback completes unassisted and e2e testsuite passes
      • Provide internal documentation (private KCS article) that explains when this is the best course of action versus working around a specific issue
      • Provide internal documentation (private KCS article) that explains the expected cluster degradation until the rollback is complete
      • Provide internal documentation (private KCS article) outlining the process and any post rollback validation

      Why is this important?

      • Even if upgrade success is 100% there's some chance that we've introduced a change which is incompatible with a customer's needs and they desire to roll back to the previous z-stream
      • Previously we've relied on backup and restore here, however due to many problems with time travel, that's only appropriate for disaster recovery scenarios where the cluster is either completely shut down already or it's acceptable to do so while also accepting loss of any workload state change (PVs that were attached after the backup was taken, etc)
      • We believe that we can reasonably roll back to a previous z-stream

      Scenarios

      1. Upgrade from 4.16.z to 4.16.z+n
      2. oc adm upgrade rollback-z-stream – which will initially be hidden command, this will look at clusterversion history and rollback to the previous version if and only if that version is a z-stream away
      3. Rollback from 4.16.z+n to exactly 4.16.z, during which the cluster may experience degraded service and/or periods of service unavailability but must eventually complete with no further admin action
      4. Must pass 4.16.z e2e testsuite

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • Fix all bugs listed here
        project = "OpenShift Bugs" AND affectedVersion in( 4.16, 4.17) AND labels = rollback AND status not in (Closed ) ORDER BY status DESC

      Documentation

      KCS : https://access.redhat.com/solutions/7089715 

      Open questions::

      1. At least today we indend to only surface this process internally and work through it with customers actively engaged with support, where do we put that?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

        There are no Sub-Tasks for this issue.

            trking W. Trevor King
            rhn-support-sdodson Scott Dodson
            Jian Li Jian Li
            Votes:
            7 Vote for this issue
            Watchers:
            54 Start watching this issue

              Created:
              Updated:
              Resolved: