Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58236

Cluster Autoscaler does not balance scale-up across node groups when free resources differ significantly

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.20.0
    • Cluster Autoscaler
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Critical
    • None
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Currently, the Cluster Autoscaler determines node group similarity based on several factors including capacity, allocatable resources, free resources, and labels. When the free resources difference between node groups exceeds the default threshold (10%), the autoscaler treats the node groups as dissimilar and does not perform balanced scale-up across them. This leads to uneven scaling behavior and inefficient resource utilization.
      
      Our testing shows that increasing the --max-free-difference-ratio parameter (e.g., to 0.5) enables the autoscaler to properly balance scale-ups across node groups. However, this parameter is currently not exposed to users as a configurable API. Additionally, in Hypershift, only some related flags are hardcoded, and in standalone OpenShift, this flag is not exposed at all.
          

      Version-Release number of selected component (if applicable):

          

      How reproducible:

      Consistently reproducible in clusters with multiple node groups having significant differences in free resources at creation time.    

      Steps to Reproduce:

          1. Create a cluster with multiple node groups where free resources differ significantly at creation time.
          2. Trigger a scale-up event.
          3. Observe that the Cluster Autoscaler does not perform balanced scale-up across node groups.
          4. Manually set --max-free-difference-ratio=0.5 in the autoscaler configuration.
          5. Observe that balanced scale-up across node groups is restored.

      Actual results:

          Cluster Autoscaler treats node groups as dissimilar and does not split scale-up evenly. Some node groups scale up while others remain unchanged, leading to inefficient resource usage.

      Expected results:

          Cluster Autoscaler balances scale-up evenly across similar node groups even when free resource differences are moderately high, controlled by an adjustable max-free-difference-ratio parameter exposed to users.

      Additional info:

          https://github.com/openshift/kubernetes-autoscaler/blob/f746d442e69be1cf82cef1c473ddc0ab8a15d22f/cluster-autoscaler/main.go#L262 

              mimccune@redhat.com Michael McCune
              rhn-support-liangli Liangquan Li
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: