Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2415

MachinePool autoscaling misbehaves with certain settings

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • None
    • False
    • Moderate

      Summary

      Using maxReplicas less than the number of MachineSets can cause problems with MachinePools.

      Steps to Reproduce

      Deploy a cluster to AWS. For the MachinePool, specify 3 AZs similar to the following:

      apiVersion: hive.openshift.io/v1
      kind: MachinePool
      metadata:
        name: kevin-3-zone-worker
        namespace: kevin-3-zone
      spec:
        clusterDeploymentRef:
          name: kevin-3-zone
        name: worker
        platform:
          aws:
            rootVolume:
              iops: 2000
              size: 100
              type: io1
            type: m5.xlarge
            zones:
            - us-east-1a
            - us-east-1b
            - us-east-1c
        replicas: 3
      

      After the cluster has deployed, enable auto-scaling on the MachinePool with maxReplicas less than the number of MachineSets by adding the following spec and removing the replicas count:

      spec:
        autoscaling:
          maxReplicas: 2
          minReplicas: 1
      

      Expected Results

      At least 1 node is removed from the cluster, and the MachinePool status is updated to show the new number of replicas.

      Actual Results

      A node is removed from the cluster, but the MachinePool status is not updated:

      status:
      ...
        machineSets:
        - maxReplicas: 1
          minReplicas: 1
          name: kevin-3-zone-m5pfj-worker-us-east-1a
          readyReplicas: 1
          replicas: 1
        - maxReplicas: 1
          minReplicas: 1
          name: kevin-3-zone-m5pfj-worker-us-east-1b
          readyReplicas: 1
          replicas: 1
        - maxReplicas: 1
          minReplicas: 1
          name: kevin-3-zone-m5pfj-worker-us-east-1c
          readyReplicas: 1
          replicas: 1
        replicas: 3
      

      Additional Information

      There may be a failure when Hive attempts to create a MachineAutoscaler on the cluster with maxReplicas set to 0 that prevents the status of the MachinePool from being updated. See Slack discussions in #forum-hive and #forum-ocm-cloud

        1. hive-controllers-64c579dfb7-tsgzh.log
          4.89 MB
          Jianping Shu
        2. machinepool_controller.go
          50 kB
          Jianping Shu

              efried.openshift Eric Fried
              rh-ee-kcormier Kevin Cormier
              Mingxia Huang Mingxia Huang
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: