-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
100% To Do, 0% In Progress, 0% Done
-
8
-
0
Feature Overview
Enable autoscaling from/to zero for NodePools of Hosted Clusters, allowing customers to efficiently manage resources and reduce costs when compute is not needed.This can be done by having CAPI support to enable autoscaler from/to zero (which should be incorporated into the HyperShift operator logic)
Goals
- Allow HCP node pools to scale down to zero nodes when not in use
- Enable autoscaling to automatically provision nodes when workloads require them
- Provide a seamless experience for users managing compute resources in HCP clusters
Primary user type: Cluster Service Consumers
Expands on existing features: Node pool management and autoscaling capabilities
Requirements
- Implement the ability to set min-replicas=0 and autoscale=y simultaneously for HCP node pools
- Ensure proper draining of nodes when scaling down to zero
- Handle degraded operators gracefully when no data plane compute is available
- Implement changes in the cluster-autoscaler machinery to support scaling from/to zero
- (optional/desired) Optimize performance to minimize the time required to scale up from zero
Deployment considerations
- Self-managed, managed, or both: works for both
- Classic (standalone cluster): N/A
- Hosted control planes: Applicable
- Multi node, Compact (three node), or Single node (SNO), or all: N/A
- Connected / Restricted Network: Both
- CPU Architectures: all
- Operator compatibility: Ensure compatibility with relevant operators
- Backport needed: To be determined based on priority and release schedule
- UI need: should be tracked in a separate OCM Jira
Use Cases
- Cost optimization: Customers can scale down to zero nodes during off-hours or low-demand periods
- On-demand scaling: Automatically provision nodes when workloads require them
- Development and testing: Easily spin up and down compute resources for development and testing environments
Questions to Answer
- What changes are required in the cluster-autoscaler machinery to support scaling from/to zero?
- How will we handle the transition from zero nodes to active autoscaling?
- What impact will this feature have on cluster startup time when scaling up from zero?
- Are there any potential security implications of allowing clusters to scale to zero nodes?
Out of Scope
- Implementing this feature for non-HCP clusters
- Full Hibernation functionality, including the control-plane (as mentioned in the discussion)
Background
This feature is being requested to provide more flexibility and cost-efficiency for HCP cluster management. It builds upon the existing autoscaling capabilities and addresses limitations in current node pool management when combined with autoscaling.
Customer Considerations
- Provide clear documentation on the implications of scaling to zero (e.g., degraded operators)
- Ensure a smooth user experience when transitioning between zero and active nodes
- Consider potential impact on SLAs and cluster responsiveness
Documentation Considerations
- Create new documentation explaining how to enable and use autoscaling from/to zero
- Update existing node pool and autoscaling documentation to include this new functionality
- Provide best practices and considerations for using this feature
Interoperability Considerations
- Ensure compatibility with ROSA HCP
- Verify interoperability with other OpenShift components and operators
- Consider the impact on monitoring and logging systems when scaling to/from zero