Uploaded image for project: 'OpenShift Autoscaling'
  1. OpenShift Autoscaling
  2. AUTOSCALE-354

Enable graceful termination for spot instances in Karpenter

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • Graceful Termination of Spot Instances In Karpenter
    • Product / Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • To Do
    • OCPSTRAT-2861 - [GA] AutoNode (Native Karpenter) with ROSA-HCP - Phase 2
    • OCPSTRAT-2861[GA] AutoNode (Native Karpenter) with ROSA-HCP - Phase 2
    • 100% To Do, 0% In Progress, 0% Done

      Allow customers to configure an SQS queue that will facilitate the graceful handling of interruptions to spot instances. At this time Karpenter controller has no way to determine that a spot instance has been interrupted and will be replaced. This results in a customer experience that when a spot instance is terminated, it is only replaced once the node has been removed from the cluster.

      Allowing Karpenter to handle spot interruptions will mean that nodes can be replaced ahead of time, allowing for better handling of interruptions.

      Acceptance Criteria

      • Customers must be able to configure Karpenter with the SQS queue that will provide spot interruption events
        • It is expected that customers will configure the SQS queue themselves via documentation or tooling
        • HostedCluster must expose an API knob that allows for specifying the SQS queue to use
        • It must be possible to enable spot interruptions at a later time than first enabling Karpenter
      • If unable to read from the SQS queue, customers should be presented with an actionable error message indicating how they can resolve the issue e.g. queue does not exist, controller doesn't have the permissions
      • Karpenter should consume events from the queue and provision additional nodes if required to maintain operations within the cluster
      • The Karpenter Controller must have the permissions to read from the SQS queue
        • For managed services e.g. ROSA, this will required additional AWS IAM permissions

      Resources

              Unassigned Unassigned
              agarcial@redhat.com Alberto Garcia Lamela
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: