-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
92% To Do, 0% In Progress, 8% Done
-
6
-
0
-
Program Call
Feature Overview (aka. Goal Summary)
As a cluster administrator, I want to use Karpenter on an OpenShift cluster running in AWS to scale nodes instead of Cluster Autoscalar(CAS). I want to automatically manage heterogeneous compute resources in my OpenShift cluster without the additional manual task of managing node pools. Additional features I want are:
- Reducing cloud costs through instance selection and scaling/descaling
- Support GPUs, spot instances, mixed compute types and other compute types.
- Automatic node lifecycle management and upgrades
This feature covers the work done to integrate upstream Karpenter 1.x with ROSA HCP. This eliminates the need for manual node pool management while ensuring cost-effective compute selection for workloads. Red Hat manages the node lifecycle and upgrades.
The feature will be rolled out with ROSA (AWS) since it has more mature Karpenter ecosystem, followed by ARO (Azure) implementation(check OCPSTRAT-1498)
Goals (aka. expected user outcomes)
- Run Karpenter in management cluster and disable CAS
- Automate node provisioning in workload cluster
- automate lifecycle management in workload cluster
- Reduce cost in heterogenous compute workloads
Requirements (aka. Acceptance Criteria):
As a cluster-admin or SRE I should be able to configure Karpenter with OCP on AWS. Both cli and UI should enable users to configure Karpenter and disable CAS.
- Run Karpenter in management cluster and disable CAS
- OCM API
- Enable/Disable Cluster autoscaler
- Enable/disable AutoNode feature
- New ARN role configuration for Karpenter
- Optional: New managed policy or integration with existing nodepool role permissions
- Expose NodeClass/Nodepool resources to users.
- secure node provisioning and management, machine approval system for Karpenter instances
- HCP Karpenter cleanup/deletion support
- ROSA CAPI fields to enable/disable/configure Karpenter
- Write end-to-end tests for karpenter running on ROSA HCP
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | managed ROSA HCP |
Classic (standalone cluster) | |
Hosted control planes | yes |
Multi node, Compact (three node), or Single node (SNO), or all | MNO |
Connected / Restricted Network | Connected |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | x86_x64, ARM (aarch64) |
Operator compatibility | |
Backport needed (list applicable versions) | No |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | yes - console |
Other (please specify) | rosa-cli |
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Questions to Answer (Optional):
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
<your text here>
Out of Scope
High-level list of items that are out of scope. Initial completion during Refinement status.
- Supporting this feature in Standalone OCP/self-hosted HCP/ROSA classic
- Creating a multi-provider cost/pricing operator compatible with CAPI is beyond the scope of this Feature. That may take more time.
Background
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
- Karpenter.sh is an open-source node provisioning project built for Kubernetes. It is designed to simplify Kubernetes infrastructure by automatically launching and terminating nodes based on the needs of your workloads. Karpenter can help you to reduce costs, improve performance, and simplify operations.
- Karpenter works by observing the unscheduled pods in your cluster and launching new nodes to accommodate them. Karpenter can also terminate nodes that are no longer needed, which can help you save money on infrastructure costs.
- Karpenter architecture has a Karpenter-core and Karpenter-provider as components.
The core has AWS code which does the resource calculation to reduce the cost by re-provisioning new nodes.
Customer Considerations
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Documentation Considerations
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
- Migration guides from using CAS to Karpenter
- Performance testing to compare CAS vs Karpenter on ROSA HCP
- API documentation for NodePool and EC2NodeClass configuration
Interoperability Considerations
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
- depends on
-
OCPSTRAT-1526 [Upstream] Phase 1 : PoC for a new upstream CAPI provider for Karpenter on AWS- (part-1)
- Closed
- is cloned by
-
OCPSTRAT-1331 Provisioning Request CRD support in Autoscaler
- New
-
OCPSTRAT-1498 Native Karpenter with ARO+HCP
- New
-
OCPSTRAT-1526 [Upstream] Phase 1 : PoC for a new upstream CAPI provider for Karpenter on AWS- (part-1)
- Closed
- is depended on by
-
OCPSTRAT-1498 Native Karpenter with ARO+HCP
- New
- relates to
-
RFE-3611 Provide fallback or priorization for MachineSet/MachinePools to guarantee scale-up in case instance type is not available
- Backlog
-
RFE-3931 Karpenter support for ROSA
- Accepted
- links to