Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-1038

Capabilities handling for late resource inclusion and manifest-less capabilities

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • Capabilities handling for late resource inclusion and manifest-less capabilities
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 75% To Do, 0% In Progress, 25% Done

      Epic Goal

      As part of the response to OCPBUGS-20321, we have identified two generic risks:

      • When identifying resources associated with a given capability, some resources may initially be missed (like the builds.config.openshift.io CRD and CR in OCPBUGS-20321).
      • Some capabilities may have no cluster-version operator manifests at all (like DeploymentConfig in OCPBUGS-20321).

      Both situations may occur occasionally in the future, and this epic is about putting in place guards and recovery mechanisms to make for less drama if that happens.

      Why is this important?

      Delivering this epic would decrease the likelihood of future issues in this space and reduce stress for Red Hat developers responding to those issues. Customer impact is expected to be minimal, although in some cases it could reduce the time to cleaning up leaked resources from "wait for the next 4.(y+1) minor version to GA" to "wait for the next 4.y.z patch version to GA".

      Scenarios

      • For Red Hat developers declaring a new capability:
        • Improve onboarding documentation to reduce the chance that relevant resources are accidently left enabled when the capability is disabled.
        • Improve API structures to allow them to declare their capability as having no cluster-version operator manifests.
      • For Red Hat developers responding to a missed resource, to have an API for adding the resource to the capability without implicitly enabling the entire capability on update.
      • For customers updating to pick up a fix for a missed resource, to not have to wait until the next 4.(y+1) minor version, while also not having the entire capability implicitly enabled on unpdate.

      Dependencies (internal and external)

      • Refinements to the existing enhancement will require enhancement approvers, likely Lala or Ben.
      • Additions to the API will require both enhancement approvers and also API approvers.
      • Implementation of cluster-version operator changes can be handled internally by the updates team.

      Contributing Teams (and contacts)

      • Development - OTA (Lala)
      • Documentation - not applicable. The functionality changes are for internal development, and do not need customer-facing documentation.
      • QE - OTA (Jia Liu)
      • PX - OTA (Subin)
      • Others - API approvers (Ben, David)

      Acceptance Criteria

      Individual cards in this epic will have their own acceptance criteria. Once the cards are closed out (either Done or otherwise), this epic will be complete.

      Drawbacks or Risk

      Delivering these tickets will take time, and it's possible that all future capability creation is perfect without needing the changes that this epic will deliver. That doesn't seem likely to me. For example, Build and DeploymentConfig are two capabilities partially implemented by the OpenShift API server, and it seems likely that there will be future capabities that also fall under the OpenShift API server, as more of that functionality eventually makes its way upstream.

      Done - Checklist

      • CI Testing - CI work for this epic will be discussed in individual code-delivering child tickets, so CI work for this epic will be complete when all child tickets are complete.
      • Documentation - not applicable, see the Contributing Teams section.
      • QE - The updates QE team is performing their own parallel work to shift capability-relevant tests from full-functional tests (which run after 4.y starts cutting RCs) to Prow periodics. This should reduce the time before issues are detected. But this change is not tracked in this epic. QE work for this epic will be discussed in individual code-delivering child tickets, which will be pre-merge tested, so QE work for this epic will be complete when all child tickets are complete.
      • Technical Enablement - not applicable, see the Contributing Teams section for why this is not a directly-customer-visible change.

            Unassigned Unassigned
            trking W. Trevor King
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: