Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-253

Tech-preview preflight checks from update-target payload

XMLWordPrintable

    • Tech-preview preflight checks from update-target payload
    • To Do
    • Quality / Stability / Reliability
    • OCPSTRAT-2638OpenShift Skip-Level Update
    • 83% To Do, 0% In Progress, 17% Done
    • None
    • None
    • L

      Epic Goal

      As a tech-preview feature, run pre-update compatibility checks with logic extracted from the candidate target release.

      Why is this important?

      For component maintainers, this allows for compatibility checks to ship with the release that requires them. This should be less work than the current approach, where "what's about to come in 4.y" knowledge comes in 4.y, and then needs to be backported to 4.(y-1) controllers to set the Upgradeable condition. One explicit example would be automating the manual cloud-cred compatibility check for clusters that use manual-mode credentials. In addition, preflights extracted from the update-target payload would scale conveniently to skip-level updates (OCPSTRAT-2638) while the Upgradeable condition approach is limited to discussing the next minor release.

      For update graph-data admins, long-running update risks like IPsecLargeClusterConnectivity (CORENET-6196) could be declared "fixed" when the risk-detection moved into the component operator, reducing the number of situations where we had to continue asking clusters to evaluate PromQL. This also reduces the number of situations where we'd need to raise the minor_min version to pick up new guard logic (e.g. graph-data#8528.

      For cluster-admins, they gain the ability to have safe skip-level updates, if we move ahead with OCPSTRAT-2638. And regardless of whether we move ahead with OCPSTRAT-2638, they have a better chance of being able to do a direct 4.(y-1).old > 4.y new, vs. the current flow where they sometimes need a 4.(y-1).old > 4.(y-1).new > 4.y multi-hop to pick up a new guard with a minor_min bump. And cluster-admins using manual mode cloud credentials would not longer need to manually check those for compatibility with the new release.

      Scenarios

      Component maintainer who notices that development branch work introduces compatibility constraints

      They should be able to easily add logic to the development branch to detect incompatible configuration in 4.(y-1) clusters.

      Cluster admin running 4.(y-1) and curious about their compatibility with 4.y or 4.(y+1)

      They should be able to easily run low-impact preflight checks from the target payload and get a compatibility report.

      Dependencies (internal and external)

      API-review approval of the enhancement and associated openshift/api change.

      Cluster-operator maintainers creating checks that plug into the preflight harness, once that harness exists. At least the manual-mode cloud-credential operator maintainers, whoc can automate the manual credential compatibility check.

      Contributing Teams and contacts

      • Development - OTA, API-review, CCO
      • Documentation - OTA
      • QE - OTA
      • PX - OTA
      • Others -

      Acceptance Criteria

      A cluster-admin can request a preflight check of their TechPreviewNoUpgrade cluster vs. the next minor release, and receive a report with blocking concerns and actionable steps to take to unblock those concerns. A cluster with manual mode cloud credentials no longer needs a manual compatibility audit, instead that check is automated, and the cluster-admin is only warned when they actually need changes.

      Drawbacks or Risk

      The current Upgradeable and conditional-update PromQL checks are evaluated continuously, to give cluster admins early feedback, so they have time to address any concerns, and are less likely to be surprised by a new concern on the day they'd been hoping to update. The initial preflight work will be on-demand, and cluster admins that do not request an early preflight may not get the same early warning that Upgradeable currently delivers. But cluster admins who want early warning will be able to request an early preflight, and future work could consider a way to schedule automatic prechecks vs. the longest hop available in the current channel, both of which mitigate this user-experience risk.

      Done - Checklist

      The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

      • CI Testing - Tests are merged and completing successfully
      • Documentation - Content development is complete.
      • QE - Test scenarios are written and executed successfully.
      • Technical Enablement - Slides are complete (if requested by PLM)
      • Other 

              trking W. Trevor King
              lmohanty@redhat.com Lalatendu Mohanty
              None
              None
              Yunfei Jiang Yunfei Jiang
              None
              Votes:
              2 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: