XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 8
    • 0

      Feature Overview

      Enable autoscaling from/to zero for NodePools of Hosted Clusters, allowing customers to efficiently manage resources and reduce costs when compute is not needed.This can be done by having CAPI support to enable autoscaler from/to zero (which should be incorporated into the HyperShift operator logic)

      Goals

      • Allow HCP node pools to scale down to zero nodes when not in use
      • Enable autoscaling to automatically provision nodes when workloads require them
      • Provide a seamless experience for users managing compute resources in HCP clusters

      Primary user type: Cluster Service Consumers

      Expands on existing features: Node pool management and autoscaling capabilities

      Requirements

      1. Implement the ability to set min-replicas=0 and autoscale=y simultaneously for HCP node pools
      2. Ensure proper draining of nodes when scaling down to zero
      3. Handle degraded operators gracefully when no data plane compute is available
      4. Implement changes in the cluster-autoscaler machinery to support scaling from/to zero
      5. (optional/desired) Optimize performance to minimize the time required to scale up from zero

      Deployment considerations

      • Self-managed, managed, or both: works for both
      • Classic (standalone cluster): N/A
      • Hosted control planes: Applicable
      • Multi node, Compact (three node), or Single node (SNO), or all: N/A
      • Connected / Restricted Network: Both
      • CPU Architectures: all
      • Operator compatibility: Ensure compatibility with relevant operators
      • Backport needed: To be determined based on priority and release schedule
      • UI need: should be tracked in a separate OCM Jira

      Use Cases

      1. Cost optimization: Customers can scale down to zero nodes during off-hours or low-demand periods
      2. On-demand scaling: Automatically provision nodes when workloads require them
      3. Development and testing: Easily spin up and down compute resources for development and testing environments

      Questions to Answer

      1. What changes are required in the cluster-autoscaler machinery to support scaling from/to zero?
      2. How will we handle the transition from zero nodes to active autoscaling?
      3. What impact will this feature have on cluster startup time when scaling up from zero?
      4. Are there any potential security implications of allowing clusters to scale to zero nodes?

      Out of Scope

      • Implementing this feature for non-HCP clusters
      • Full Hibernation functionality, including the control-plane (as mentioned in the discussion)

      Background

      This feature is being requested to provide more flexibility and cost-efficiency for HCP cluster management. It builds upon the existing autoscaling capabilities and addresses limitations in current node pool management when combined with autoscaling.

      Customer Considerations

      • Provide clear documentation on the implications of scaling to zero (e.g., degraded operators)
      • Ensure a smooth user experience when transitioning between zero and active nodes
      • Consider potential impact on SLAs and cluster responsiveness

      Documentation Considerations

      • Create new documentation explaining how to enable and use autoscaling from/to zero
      • Update existing node pool and autoscaling documentation to include this new functionality
      • Provide best practices and considerations for using this feature

      Interoperability Considerations

      • Ensure compatibility with ROSA HCP
      • Verify interoperability with other OpenShift components and operators
      • Consider the impact on monitoring and logging systems when scaling to/from zero

              Unassigned Unassigned
              agarcial@redhat.com Alberto Garcia Lamela
              Matthew Werner Matthew Werner
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: