Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 4.17.z
Affects Version/s: 4.17, 4.18, 4.19
Component/s: Cluster Autoscaler
Labels:
- component-regression

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
3
Severity:
Important
Regression:
Yes

Target Backport Versions:

4.17.z, 4.16.z, 4.18.z
Target Version:

4.17.z
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
* Before this update, when a `Machine Set` was scaled down and had reached its minimum size, the Cluster Autoscaler could leave the last remaining node with a `NoSchedule` taint that prevented use of a node. This issue was caused by a counting error in the Cluster Autoscaler. With this release, the counting error has been fixed so that the Cluster Autoscaler works as expected when a `Machine Set` is scaled down and has reached its minimum size. (link:https://issues.redhat.com/browse/OCPBUGS-59266[~~OCPBUGS-59266~~])

Show
* Before this update, when a `Machine Set` was scaled down and had reached its minimum size, the Cluster Autoscaler could leave the last remaining node with a `NoSchedule` taint that prevented use of a node. This issue was caused by a counting error in the Cluster Autoscaler. With this release, the counting error has been fixed so that the Cluster Autoscaler works as expected when a `Machine Set` is scaled down and has reached its minimum size. (link: https://issues.redhat.com/browse/OCPBUGS-59266 [ OCPBUGS-59266 ])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

After scale down the last node always has ToBeDeletedByClusterAutoscaler taints. No issue for 4.19.0-0.nightly-2025-03-07-175123 and 4.18 clusters. 

OCP-28108:ClusterInfrastructure Cluster should automatically scale up and scale down with clusterautoscaler deployed

Version-Release number of selected component (if applicable):

4.19.0-0.nightly-2025-03-20-013047

How reproducible:

Always

Steps to Reproduce:

    1. Create clusterautoscaler and machineautoscaler  
    2. Add workload to scale up cluster
    3. After cluster is stable, remove workload
    4. Check the last node taints

Actual results:

After scale down, the last node always has ToBeDeletedByClusterAutoscaler taint 

 $ oc get node zhsun-az201-nrwn6-worker-eastus3-zgntk -o yaml
  taints:
  - effect: NoSchedule
    key: ToBeDeletedByClusterAutoscaler
    value: "1742477304" 

$ oc get node                                                                            
NAME                                     STATUS   ROLES                  AGE    VERSION
zhsun-az201-nrwn6-master-0               Ready    control-plane,master   11h    v1.32.2
zhsun-az201-nrwn6-master-1               Ready    control-plane,master   11h    v1.32.2
zhsun-az201-nrwn6-master-2               Ready    control-plane,master   11h    v1.32.2
zhsun-az201-nrwn6-worker-eastus1-sgblk   Ready    worker                 10h    v1.32.2
zhsun-az201-nrwn6-worker-eastus2-rhdt4   Ready    worker                 10h    v1.32.2
zhsun-az201-nrwn6-worker-eastus3-zgntk   Ready    worker                 175m   v1.32.2

$ oc get machineautoscaler                                        
NAME                               REF KIND     REF NAME                           MIN   MAX   AGE
zhsun-az201-nrwn6-worker-eastus3   MachineSet   zhsun-az201-nrwn6-worker-eastus3   1     3     33m

Expected results:

After scale down, the last node no ToBeDeletedByClusterAutoscaler taint

Additional info:

upstream discussion https://github.com/kubernetes/autoscaler/issues/7964

Found this when testing bug https://issues.redhat.com/browse/OCPBUGS-11115

clones

OCPBUGS-59260 [release-4.18] After scale down the last node has ToBeDeletedByClusterAutoscaler taint

Closed

depends on

OCPBUGS-59260 [release-4.18] After scale down the last node has ToBeDeletedByClusterAutoscaler taint

Closed

is cloned by

OCPBUGS-59267 [release-4.16] After scale down the last node has ToBeDeletedByClusterAutoscaler taint

Closed

is depended on by

OCPBUGS-59267 [release-4.16] After scale down the last node has ToBeDeletedByClusterAutoscaler taint

Closed

links to

openshift/kubernetes-autoscaler#362: [release-4.17] OCPBUGS-59266: Fix cool down status condition to trigger scale down

Assignee:: Michael McCune

Reporter:: Zhaohua Sun

Need Info From:: None

Contributors:: None

QA Contact:: Paul Rozehnal

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/07/11 6:04 PM

Updated:: 2025/09/24 5:12 AM

Resolved:: 2025/09/24 5:12 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates