-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Graceful Termination of Spot Instances In Karpenter
-
Product / Portfolio Work
-
False
-
-
False
-
To Do
-
OCPSTRAT-2861 - [GA] AutoNode (Native Karpenter) with ROSA-HCP - Phase 2
-
-
100% To Do, 0% In Progress, 0% Done
Allow customers to configure an SQS queue that will facilitate the graceful handling of interruptions to spot instances. At this time Karpenter controller has no way to determine that a spot instance has been interrupted and will be replaced. This results in a customer experience that when a spot instance is terminated, it is only replaced once the node has been removed from the cluster.
Allowing Karpenter to handle spot interruptions will mean that nodes can be replaced ahead of time, allowing for better handling of interruptions.
Acceptance Criteria
- Customers must be able to configure Karpenter with the SQS queue that will provide spot interruption events
- It is expected that customers will configure the SQS queue themselves via documentation or tooling
- HostedCluster must expose an API knob that allows for specifying the SQS queue to use
- It must be possible to enable spot interruptions at a later time than first enabling Karpenter
- If unable to read from the SQS queue, customers should be presented with an actionable error message indicating how they can resolve the issue e.g. queue does not exist, controller doesn't have the permissions
- Karpenter should consume events from the queue and provision additional nodes if required to maintain operations within the cluster
- The Karpenter Controller must have the permissions to read from the SQS queue
- For managed services e.g. ROSA, this will required additional AWS IAM permissions
Resources
- clones
-
AUTOSCALE-138 Increase Karpenter test coverage
-
- To Do
-
- links to