Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14712

Cluster autoscaler is unable to scale down nodes in Azure cloud

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • 4.10
    • Cluster Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • No
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Cluster autoscaler appears unable to scale nodes down. 
      
      In cluster-autoscaler-default pod logs, we can see that the nodes are tainted for deletion and then repeatedly see "Skipping {node} from delete consideration - the node is currently being deleted" followed by "Nodegroup is nil for azure:///subscriptions/blah/blah/samenodename"
      
      Customer tried draining the nodes in question to give the autoscaler a hand and to see if any stray pods holding on to storage on the nodes or something like that but no change.
      
      According to customer, this morning (after several days of issues and reportedly without any change by customer), autoscaler worked fine and was successful in scaling down nodes.
      
      More logs / info to follow and in support case 03528995

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

       

              mimccune@redhat.com Michael McCune
              rhn-support-dasmall Daniel Small
              None
              None
              Zhaohua Sun Zhaohua Sun
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: