Uploaded image for project: 'OpenShift Pod Autoscaling'
  1. OpenShift Pod Autoscaling
  2. PODAUTO-287

Implement HCP karpenter deletion

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Implement HCP karpenter deletion
    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • In Progress
    • OCPSTRAT-943 - [Tech-Preview]Native Karpenter with ROSA+HCP
    • OCPSTRAT-943[Tech-Preview]Native Karpenter with ROSA+HCP
    • 67% To Do, 0% In Progress, 33% Done

      Goal

      • The goal of this epic is to support deletion of Karpenter provisioned nodes and their corresponding instances, when a HyperShift HostedCluster is torn down. The end goal is that all infrastructure backed instances are automatically fully removed from their infrastructure when deletion of the HostedCluster is finished. A subgoal includes allowing metrics, alerts, and events to be emitted during teardown.
      • This epic is a part of the strategic feature work for OpenShift AutoNode: https://docs.google.com/document/d/1ID_IhXPpYY4K3G_wa1MYJxOb3yz5FYoOj3ONSkEDsZs/edit?usp=sharing

      Why is this important?

      • This is important because a user will expect all related resources corresponding to a HostedCluster is deleted when it is torn down. We need to specially care for Karpenter instances since they are being provisioned outside of the cluster's environment and being registered with the cluster afterwards. That means we will need to delete them from the infrastructure during teardown, without potentially leaking resources.
      • It is also important that deletion deadlocks are minmized so that users are not stuck during deletion for an excessive amount of time.
      • Additionally, metrics, events, and alerts will allow cluster-admins to diagnose any potential problems related to Karpenter/AutoNode during the tear down phase, and allow them to safely deprovision the cluster.

      Scenarios

      1. A cluster admin creates a HostedCluster with AutoNode enabled, creates some workloads on the cluster which initiate Karpenter provisioning of nodes, and then deletes the cluster.
      2. A cluster admin creates a HostedCluster with AutoNode enabled, creates some workloads on the cluster which initiate Karpenter provisioning of nodes, and then deletes the cluster, but the deletion is timed out due to some issue in the deletion process.

      Acceptance Criteria

      • Dev - Deletion implementation has been merged, and metrics, alerts, events, etc. have been added.
      • Dev - Upstream docs are merged that include document the deletion process, and steps to debug a stuck/failed deletion
      • CI - MUST be running successfully with tests automated
      • QE - covered in Polarion test plan and tests implemented (created hypershift-hosted cluster with AutoNode on, create some workloads, delete hosted cluster, make sure karpenter provisioned instances are deleted from infrastructure)
      • Release Technical Enablement - Must have TE slides

      Dependencies (internal and external)

      1. None

      Previous Work (Optional):

      1. None

      Open questions:

      1. None for now. Some questions were covered by this spike: https://issues.redhat.com/browse/PODAUTO-313

      Done Checklist

      • CI - CI is running, tests are automated and merged. <link to tests in openshift/release>
      • Release Technical Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: N/A

              rh-ee-macao Max Cao
              agarcial@redhat.com Alberto Garcia Lamela
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: