Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-1976

Tuning "BalanceSimilarNodeGroups" to true by default when creating the "default" ClusterAutoscaler

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • None
    • False
    • None
    • False

      The RedHat OpenShift Streams for Apache Kafka (RHOSAK) product - or also known as Managed Kafka, is looking to use this feature on its clusters which are created using OCM. We want to reduce the toil in manually patching up the default cluster autoscaler for each of our clusters. Once this RFE is considered, it's going to be super useful in the OSD fleet management for our services.
       
      At the moment,hive creates a default autoscaler (https://github.com/openshift/hive/blob/4ae1f97bfc2df1d39c0879d97b8cb9022e8974bc/pkg/controller/machinepool/machinepool_controller.go#L815-L838
      ) with only ScaleDown.Enabled=true configuration, on top of this, enabling BalanceSimilarNodeGroups by default would be beneficial to teams using OCM API to provision clusters: OCM uses Hive under the hood.

      By enabling balance similar node groups by default, the cluster autoscaler will automatically identify node groups with the same instance type and the same set of labels and try to keep the respective sizes of those node groups balanced. This is very useful for when you've workload which are located in multiple zones for redundancy / HA (see the design proposal here - https://github.com/kubernetes/autoscaler/blob/34dfd9af0a42a2de95806f500c3dc354d56c2b1c/cluster-autoscaler/proposals/balance_similar.md). With the current default configuration for this option set to false the autoscaler randomly adds node across node groups unevenly.

              abutcher@redhat.com Andrew Butcher
              mchitimb-1 Manyanda Chitimbo
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: