Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59266

[release-4.17] After scale down the last node has ToBeDeletedByClusterAutoscaler taint

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Important
    • Yes
    • Done
    • Bug Fix
    • Hide
      * Before this update, when a `Machine Set` was scaled down and had reached its minimum size, the Cluster Autoscaler could leave the last remaining node with a `NoSchedule` taint that prevented use of a node. This issue was caused by a counting error in the Cluster Autoscaler. With this release, the counting error has been fixed so that the Cluster Autoscaler works as expected when a `Machine Set` is scaled down and has reached its minimum size. (link:https://issues.redhat.com/browse/OCPBUGS-59266[OCPBUGS-59266])
      Show
      * Before this update, when a `Machine Set` was scaled down and had reached its minimum size, the Cluster Autoscaler could leave the last remaining node with a `NoSchedule` taint that prevented use of a node. This issue was caused by a counting error in the Cluster Autoscaler. With this release, the counting error has been fixed so that the Cluster Autoscaler works as expected when a `Machine Set` is scaled down and has reached its minimum size. (link: https://issues.redhat.com/browse/OCPBUGS-59266 [ OCPBUGS-59266 ])
    • None
    • None
    • None
    • None

      Description of problem:

      After scale down the last node always has ToBeDeletedByClusterAutoscaler taints. No issue for 4.19.0-0.nightly-2025-03-07-175123 and 4.18 clusters. 
      
      OCP-28108:ClusterInfrastructure Cluster should automatically scale up and scale down with clusterautoscaler deployed

      Version-Release number of selected component (if applicable):

      4.19.0-0.nightly-2025-03-20-013047

      How reproducible:

      Always

      Steps to Reproduce:

          1. Create clusterautoscaler and machineautoscaler  
          2. Add workload to scale up cluster
          3. After cluster is stable, remove workload
          4. Check the last node taints
          

      Actual results:

      After scale down, the last node always has ToBeDeletedByClusterAutoscaler taint 
      
       $ oc get node zhsun-az201-nrwn6-worker-eastus3-zgntk -o yaml
        taints:
        - effect: NoSchedule
          key: ToBeDeletedByClusterAutoscaler
          value: "1742477304" 
      
      $ oc get node                                                                            
      NAME                                     STATUS   ROLES                  AGE    VERSION
      zhsun-az201-nrwn6-master-0               Ready    control-plane,master   11h    v1.32.2
      zhsun-az201-nrwn6-master-1               Ready    control-plane,master   11h    v1.32.2
      zhsun-az201-nrwn6-master-2               Ready    control-plane,master   11h    v1.32.2
      zhsun-az201-nrwn6-worker-eastus1-sgblk   Ready    worker                 10h    v1.32.2
      zhsun-az201-nrwn6-worker-eastus2-rhdt4   Ready    worker                 10h    v1.32.2
      zhsun-az201-nrwn6-worker-eastus3-zgntk   Ready    worker                 175m   v1.32.2
      
      $ oc get machineautoscaler                                        
      NAME                               REF KIND     REF NAME                           MIN   MAX   AGE
      zhsun-az201-nrwn6-worker-eastus3   MachineSet   zhsun-az201-nrwn6-worker-eastus3   1     3     33m   

      Expected results:

      After scale down, the last node no ToBeDeletedByClusterAutoscaler taint      

      Additional info:

      upstream discussion https://github.com/kubernetes/autoscaler/issues/7964 

      Found this when testing bug https://issues.redhat.com/browse/OCPBUGS-11115 

              mimccune@redhat.com Michael McCune
              rhn-support-zhsun Zhaohua Sun
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: