Uploaded image for project: 'OpenShift Hosted Control Plane'
  1. OpenShift Hosted Control Plane
  2. HOSTEDCP-1402

Investigate using installer approach to destroying hosted cluster cloud resources

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Installer-like approach to hosted cluster teardown
    • False
    • None
    • False
    • Not Selected
    • To Do
    • OCPSTRAT-1715 - Control Plane Operator Direct cloud resource cleanup
    • OCPSTRAT-1715Control Plane Operator Direct cloud resource cleanup
    • 100% To Do, 0% In Progress, 0% Done
    • Hypershift Sprint 253, Hypershift Sprint 254, Hypershift Sprint 255, Hypershift Sprint 256, Hypershift Sprint 257
    • 0
    • 0
    • 0

      User Story:

      As a Hosted Cluster admin, I want to be able to:

      • Delete hosted clusters in the minimum time

      so that I can achieve

      • Minimum cloud resource consumption

      Service provider achieves

      • Better UX
      • Less computation related to deleting resources

      Acceptance Criteria:

      Description of criteria:

      • HyperShift directly manages resource deletion
      • Resource deletion failure alert the customer (specially important for billable items)

      Out of Scope:

      Cloud resource deletion throttling detection

      Engineering Details:

      • Currently cloud resource cleanup is delegated to operators that run in the hosted control plane (registry operator cleans up its bucket, ingress operator removes additional dns entries, cloud controller manager removes load balancers and persistent volumes, etc). The benefit with this approach is that we don't need cloud-specific code in the CPO to destroy resources. The drawback is that this cleanup can sometimes take a long time and depends on the hosted cluster's API server to be in a healthy state.
      • A different approach which could make this process faster is to directly destroy resources in a similar way to `openshift-installer destroy cluster` or even `hypershift destroy cluster infra`. Instead of waiting for controllers to do the right thing, we can directly destroy resources. This would make it more straightforward and likely much faster.
      • One consideration with this approach is that unlike the CLI tools, the CPO doesn't have a single role that can destroy all resources. We would have to access AWS with different operator roles to destroy the different types of resources. This can be done via API calls similar to what the token-minter command makes to obtain tokens for the different service accounts.

              Unassigned Unassigned
              cewong@redhat.com Cesar Wong
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: