Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-1352

Transparency around newer to older release "updates"

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Transparency around newer to older release "updates"
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 100% To Do, 0% In Progress, 0% Done

      Epic Goal

      The goal of this epic is to it clear that 4.y.new to 4.(y+1).older updates would expose clusters to bugs fixed in 4.y.new that flowed through 4.(y+1) as backports since 4.(y+1).older was released.

      Why is this important?

      4.14 and later have rollback-prevention guards that prevent folks from updates like 4.y.big to 4.y.small (in most cases), and from 4.y to 4.(y-1) and similar. Moving to an older Semantic Version clearly exposes a cluster to bugs and loss-of-features, and we don't get a lot of requests for that outside of the rollback situation OTA-455 was touching on.

      However, there is considerable confusion in 4.y.z to 4.(y+1).z' updates, where you are gaining features, but depending on z and z' could be gaining or losing bugfixes. For example, OCPBUGS-41910 shipped in 4.16.14. It's the 4.16 backport of 4.17's OCPBUGS-41908, which missed 4.17.0 and will likely ship in a later 4.17.z. Customers sitting on 4.16.14 may wonder why they are not seeing updates to 4.17.0, and it's because that update would regress the cluster on that Telemetry series, along with any other bugs with similar backport timing. This epic will make that reasoning clear to cluster administrators, so they are less likely to ask after the reasoning in backchannel discussion, reducing the time it takes cluster administrators to get their question answered, and reducing the effort CVO maintainers currently invest in answering those backchannel questions.

      Scenarios

      To deliver this context, Cincinnati and the OpenShift Update Service (OSUS) will inject a OpenShiftBackportPolicy conditional update issue from each 4.y.z at least a single 4.(y+1).z' whose backing code is older than 4.y.z's and thus susceptible to these backport timing issues. After the initial automation, ongoing update-developer maintenance load is expected to be minimal.

      Cluster admins and other OSUS consumers will receive this information via existing conditional-update-issue channels (oc adm upgrade, the in-cluster web-console, etc.), and hopefully self-serve their questions without needing to reach out for additional context.

      Dependencies

      Only the update team needs to be involved in delivering this epic. The remaining tooling is already in place, or (e.g. XCMSTRAT-513, ACM-13697) is being developed already, regardless of whether conditional update issues are extended to include this backport timing issue or not.

      Contributing Teams

      • Development - OTA
      • Documentation - not required
      • QE - OTA
      • PX - OTA

      Acceptance Criteria

      A 4.y.z tip release's oc adm upgrade (or OC_ENABLE_CMD_UPGRADE_RECOMMEND=true oc adm upgrade recommend) will include a OpenShiftBackportPolicy risk for at least one 4.(y+1).z' for any 4.y.z impacted by this backport timing.

      Drawbacks or Risk

      Updates in conditionalUpdates (in ClusterVersion) or in conditionalEdges (in supported OSUS channels) are currently supported regardless of whether declared update issues are assessed to apply to the current cluster. But acceptedRisks should track this in ClusterVersion history, and it seems unlikely that customer administrators accept these risks, update into the regressions we were warning about, and then complain about being surprised by those regressions. Still, definitely worth rounding with product experience and support folks to make sure everyone's comfortable with changes in this space before rolling anything out.

      One benefit to delivering the feature via Cincinnati is that we can revert the Cincinnati changes if it seems like the confusion the change causes outweighs the confusion the change reduces.

      Done - Checklist

      The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

      • CI Testing - Tests are merged and completing successfully
      • Documentation - Content development is complete.
      • QE - Test scenarios are written and executed successfully.
      • Technical Enablement - Slides are complete (if requested by PLM)
      • Other 

              Unassigned Unassigned
              trking W. Trevor King
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: