Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-75869

Cluster Autoscaler scale-down blocked by resource utilization threshold exceeded

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.21.z, 4.22
    • Cluster Autoscaler
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      When using Hive to create OpenShift clusters with autoscaling enabled, after adding workloads, the scale-down functionality fails to operate correctly, resulting in the cluster being unable to automatically scale down based on actual workload requirements.    

      Version-Release number of selected component (if applicable):

      quay.io/openshift-release-dev/ocp-release:4.21.0-x86_64 - Failed
      registry.ci.openshift.org/ocp/release:4.22.0-0.nightly-2026-01-31-082403 - Failed
      quay.io/openshift-release-dev/ocp-release:4.20.13-x86_64 - PASS!    

      How reproducible:

          Always

      Steps to Reproduce:

      Steps to Reproduce:
      1. Create a Hive ClusterDeployment with a MachinePool that has autoscaling (minReplicas=10, maxReplicas=12) on a multi-AZ platform (e.g. AWS 3 AZs).
      2. Wait for cluster to install and scale to min 10 workers; MachineSets stabilize at [4, 3, 3].
      3. Deploy a workload on spoke cluster (e.g. a busybox deployment) that consumes capacity so the autoscaler scales up to 12 workers; MachineSets become [4, 4, 4].
      4. Delete the busybox deployment. 
      5.The autoscaler should scale the pool back down to min 10 workers.
      
      cluster-autoscaler logs:
      I0203 23:01:16.484087       1 eligibility.go:163] Node ip-10-0-46-119.us-east-2.compute.internal unremovable: memory requested (52.8758% of allocatable) is above the scale-down utilization threshold

      Actual results:

          Scale-down to 10 never occurs; worker count remains 12. 

      Expected results:

      Cluster Autoscaler should consider scale-down for all worker nodes in all three MachineSets.
      Two of the three MachineSets have min=3 (pool min 10 distributed as 4+3+3); 
      CA should be able to scale down one node in each of those two MachineSets, reducing total workers from 12 to 10.

      Additional info:

          

      slack: https://redhat-internal.slack.com/archives/CE3ETN3J8/p1770133981878329 

      Please refer to HIVE-3068  for more log file information.

              mimccune@redhat.com Michael McCune
              mihuang@redhat.com Mingxia Huang
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: