Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57041

ClusterAutoscaler sometimes creates 2 cluster-autoscaler-default pods

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • 4.20.0
    • 4.17.z, 4.18.z, 4.19.z, 4.20
    • Cluster Autoscaler
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • AUTOSCALE - Sprint 272
    • 1
    • Proposed
    • Bug Fix
    • Hide
      *Cause*: Somtimes, when creating a ClusterAutoscaler object with Cluster Autoscaler Operator enabled in the cluster.
      *Consequence*: When the ClusterAutoscaler is created, sometimes 2 cluster-autoscaler-default pods in the openshift-machine-api are created at the same time and one is immediately killed.
      *Fix*: Internal fixes, no need to explain.
      *Result*: Only 1 pod is created now..
      Show
      *Cause*: Somtimes, when creating a ClusterAutoscaler object with Cluster Autoscaler Operator enabled in the cluster. *Consequence*: When the ClusterAutoscaler is created, sometimes 2 cluster-autoscaler-default pods in the openshift-machine-api are created at the same time and one is immediately killed. *Fix*: Internal fixes, no need to explain. *Result*: Only 1 pod is created now..
    • None
    • None
    • None
    • None

      Description of problem:

          When you first create a ClusterAutoscaler object, sometimes 2 cluster-autoscaler-default pods start up, and one is killed automatically.

      Version-Release number of selected component (if applicable):

          4.20

      How reproducible:

          Only the first time a clusterautoscaler operator has created a ClusterAutoscaler object, or if there doesn't exist a ClusterAutoscaler object that the operator has ever observed.

      Steps to Reproduce:

          1. Install a non-hcp cluster with CAS enabled.
          2. Create a ClusterAutoscaler object like this:
      
      apiVersion: "autoscaling.openshift.io/v1"
      kind: "ClusterAutoscaler"
      metadata:
        name: "default"
      spec:
        logVerbosity: 6
        balanceSimilarNodeGroups: true
        ignoreDaemonsetsUtilization: false
        skipNodesWithLocalStorage: true
        podPriorityThreshold: -10
        resourceLimits:
          maxNodesTotal: 24
          cores:
            min: 8
            max: 128
          memory:
            min: 4
            max: 256
        scaleDown:
          enabled: true
          # How long after scale up that scale down evaluation resumes - if omitted defaults to 10 minutes
          delayAfterAdd: 1m
          # How long after node deletion that scale down evaluation resumes - if omitted defaults to 0 seconds
          delayAfterDelete: 1m
          # How long after scale down failure that scale down evaluation resumes - if omitted defaults to 3 minutes
          delayAfterFailure: 1m
          # How long a node should be unneeded before it is eligible for scale down - if omitted defaults to 10 minutes
          uneededTime: 1m
          # Node utilization level, defined as sum of requested resources divided by capacity, below which a node can be considered for scale down - if omitted defaults to 0.5
          utilizationThreshold: "0.4"     
      
      3. Watch all pods in the openshift-machine-api namespace, and observe that 2 pods are created, but 1 is immediately killed.
          

      Actual results:

          Two cluster-autoscaler-default pods are started but 1 is killed immediately.

      Expected results:

          Only one cluster-autoscaler-default pod should be created.

      Additional info:

          

              rh-ee-macao Max Cao
              rh-ee-macao Max Cao
              None
              None
              Paul Rozehnal Paul Rozehnal
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: