Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-858

Automation for graph-data update risk extension

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • 1
    • False
    • None
    • False
    • OTA 231

      cblecker.openshift noticed 4.11.21 entering fast channels without a declared ARM64SecCompError524, despite that bug still not being fixed for new 4.11.z. lmohanty@redhat.com added the missing risk declaration in graph-data#2949. But we want to set up some guardrails to avoid forgetting again. Suggested implementation:

      • A new fixedIn property in blocked-edges/*.yaml files that takes the same input that to currently uses. This property only feeds hack/stabilization-changes.py; Cincinnati ignores it.
      • Until the bug has a known fixed-in version, fixedIn remains unpopulated.
      • Once the bug has a known fixed-in version, graph-data admins manually add fixedIn to at least the most recent 4.y.z with the given name slug. Adding it to earlier 4.y.z would be nice, but is not required. This busywork could potentially be automated by robots with Jira access, and potentially also in-flight errata-association access, depending on how much we trust that metadata at candidate-promotion-time, so long before errata-release time. If we automate it, that would be future work.
      • ART candidate promotions happen as always, even with possibly unfixed, undeclared risks.
      • hack/stabilization-changes.py, when considering whether to promote 4.y.z into CHANNEL:
        1. Finds the most recent 4.y.z' (with z' < z) that's already in the channel.
        2. Looks up any declared risks heading into 4.y.z'.
        3. For each 4.y.z' risk:
          1. The risk is accounted for if there is a 4.y.z risk with the same name, or if the 4.y.z' risk has a fixedIn that is less than or equal to (in SemVer ordering) 4.y.z.
        4. If there are any unaccounted for risks, the script refuses to promote the release, and complains.
        5. Existing tooling pipes those complaints into Slack (e.g. see OTA-779, complaining about promotions blocked on missing connectivity), which trigger the monitor to assess the exposure, and either declare a risk for 4.y.z, or set fixedIn for the 4.y.z' risk.
        6. The next stabilization script round picks up that data, finds no remaining unaccounted risks, and promotes 4.y.z into CHANNEL.

              trking W. Trevor King
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: