-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.12
-
None
-
Moderate
-
None
-
CLOUD Sprint 239, CLOUD Sprint 240, CLOUD Sprint 241, CLOUD Sprint 242, CLOUD Sprint 243
-
5
-
False
-
Description of problem:
ClusterAutoscaler configured with `balanceSimilarNodeGroups` set to `true`, if there are machinesets which scales up from 0, the autoscaler will first scale in these machinesets, after they are full, then other node groups.
Version-Release number of selected component (if applicable):
4.12.0-0.nightly-2022-10-05-053337
How reproducible:
always
Steps to Reproduce:
1. Create clusterautoscaler on gcp apiVersion: "autoscaling.openshift.io/v1" kind: "ClusterAutoscaler" metadata: name: "default" spec: balanceSimilarNodeGroups: true balancingIgnoredLabels: ["topology.gke.io/zone"] resourceLimits: maxNodesTotal: 20 scaleDown: enabled: true delayAfterAdd: 10s delayAfterDelete: 10s delayAfterFailure: 10s unneededTime: 10s 2. Create machineautoscalers, some machineset need scale from 0 $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE zhsungcp10-lmfbm-worker-a 1 1 1 1 128m zhsungcp10-lmfbm-worker-b 0 0 128m zhsungcp10-lmfbm-worker-c 1 1 1 1 128m zhsungcp10-lmfbm-worker-f 0 0 128m $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler-a MachineSet zhsungcp10-lmfbm-worker-a 1 20 39m machineautoscaler-b MachineSet zhsungcp10-lmfbm-worker-b 0 19 39m machineautoscaler-c MachineSet zhsungcp10-lmfbm-worker-c 1 20 13s machineautoscaler-f MachineSet zhsungcp10-lmfbm-worker-f 0 19 39m 3. Create workload 4. Check machineset and log
Actual results:
If there are machinesets which scales up from 0, the autoscaler will first balance in these machinesets, after they are full, then scale in other node groups. I1010 08:39:48.865639 1 scale_up.go:481] Estimated 26 nodes needed in MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b I1010 08:39:48.865645 1 scale_up.go:486] Capping size to max cluster total size (30) I1010 08:39:49.605437 1 scale_up.go:591] Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f} I1010 08:39:49.605472 1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 0->13 (max: 19)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f 0->12 (max: 19)}] I1010 08:39:49.605492 1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b size to 13 I1010 08:39:50.209449 1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 12 $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE zhsungcp10-lmfbm-worker-a 1 1 1 1 130m zhsungcp10-lmfbm-worker-b 13 13 130m zhsungcp10-lmfbm-worker-c 1 1 1 1 130m zhsungcp10-lmfbm-worker-f 12 12 130m
Expected results:
Balance in all node groups.
Additional info:
Other testing on gcp: $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler-a MachineSet zhsungcp10-lmfbm-worker-a 1 10 3m41s machineautoscaler-b MachineSet zhsungcp10-lmfbm-worker-b 1 10 3m55s machineautoscaler-f MachineSet zhsungcp10-lmfbm-worker-f 0 10 4m15s Add workload: Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 10 Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-a} I1010 07:46:11.862566 1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 1->3 (max: 10)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-a 1->3 (max: 10)}] -------- $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler-a MachineSet zhsungcp10-lmfbm-worker-a 1 10 22m machineautoscaler-b MachineSet zhsungcp10-lmfbm-worker-b 0 9 22m machineautoscaler-f MachineSet zhsungcp10-lmfbm-worker-f 0 9 22m Add workload: Capping size to max cluster total size (20) I1010 08:32:19.557095 1 scale_up.go:591] Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f} I1010 08:32:19.557132 1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 0->8 (max: 9)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f 0->7 (max: 9)}] I1010 08:32:19.557149 1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b size to 8 I1010 08:32:20.161422 1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 7 $ oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE zhsungcp10-lmfbm-worker-a 1 1 1 1 123m zhsungcp10-lmfbm-worker-b 8 8 123m zhsungcp10-lmfbm-worker-c 1 1 1 1 123m zhsungcp10-lmfbm-worker-f 7 7 123m