Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Minor
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Cloud Compute / Cluster Autoscaler
Labels:
None

Severity:
Moderate
Regression:
None
Sprint:
CLOUD Sprint 239, CLOUD Sprint 240, CLOUD Sprint 241, CLOUD Sprint 242, CLOUD Sprint 243
sprint_count:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

ClusterAutoscaler configured with `balanceSimilarNodeGroups` set to `true`, if there are machinesets which scales up from 0, the autoscaler will first scale in these machinesets, after they are full, then other node groups.

Version-Release number of selected component (if applicable):

4.12.0-0.nightly-2022-10-05-053337

How reproducible:

always

Steps to Reproduce:

1. Create clusterautoscaler on gcp
apiVersion: "autoscaling.openshift.io/v1"
kind: "ClusterAutoscaler"
metadata:
  name: "default"
spec:
  balanceSimilarNodeGroups: true
  balancingIgnoredLabels: ["topology.gke.io/zone"]
  resourceLimits:
    maxNodesTotal: 20
  scaleDown:
    enabled: true
    delayAfterAdd: 10s
    delayAfterDelete: 10s
    delayAfterFailure: 10s
    unneededTime: 10s
2. Create machineautoscalers, some machineset need scale from 0
$ oc get machineset                                                                                                      
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
zhsungcp10-lmfbm-worker-a   1         1         1       1           128m
zhsungcp10-lmfbm-worker-b   0         0                             128m
zhsungcp10-lmfbm-worker-c   1         1         1       1           128m
zhsungcp10-lmfbm-worker-f   0         0                             128m

$ oc get machineautoscaler                                                                                                 
NAME                  REF KIND     REF NAME                    MIN   MAX   AGE
machineautoscaler-a   MachineSet   zhsungcp10-lmfbm-worker-a   1     20    39m
machineautoscaler-b   MachineSet   zhsungcp10-lmfbm-worker-b   0     19    39m
machineautoscaler-c   MachineSet   zhsungcp10-lmfbm-worker-c   1     20    13s
machineautoscaler-f   MachineSet   zhsungcp10-lmfbm-worker-f   0     19    39m
3. Create workload
4. Check machineset and log

Actual results:

If there are machinesets which scales up from 0, the autoscaler will first balance in these machinesets, after they are full, then scale in other node groups.

 I1010 08:39:48.865639       1 scale_up.go:481] Estimated 26 nodes needed in MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b
I1010 08:39:48.865645       1 scale_up.go:486] Capping size to max cluster total size (30)
I1010 08:39:49.605437       1 scale_up.go:591] Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f}
I1010 08:39:49.605472       1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 0->13 (max: 19)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f 0->12 (max: 19)}]
I1010 08:39:49.605492       1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b size to 13
I1010 08:39:50.209449       1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 12

$ oc get machineset                                                                                        
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
zhsungcp10-lmfbm-worker-a   1         1         1       1           130m
zhsungcp10-lmfbm-worker-b   13        13                            130m
zhsungcp10-lmfbm-worker-c   1         1         1       1           130m
zhsungcp10-lmfbm-worker-f   12        12                            130m

Expected results:

Balance in all node groups.

Additional info:

Other testing on gcp: 
$ oc get machineautoscaler                                                                                                                                                     
NAME                  REF KIND     REF NAME                    MIN   MAX   AGE
machineautoscaler-a   MachineSet   zhsungcp10-lmfbm-worker-a   1     10    3m41s
machineautoscaler-b   MachineSet   zhsungcp10-lmfbm-worker-b   1     10    3m55s
machineautoscaler-f   MachineSet   zhsungcp10-lmfbm-worker-f   0     10    4m15s

Add workload:
Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 10
Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-a}
I1010 07:46:11.862566       1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 1->3 (max: 10)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-a 1->3 (max: 10)}]

--------
$ oc get machineautoscaler                                                                                          
NAME                  REF KIND     REF NAME                    MIN   MAX   AGE
machineautoscaler-a   MachineSet   zhsungcp10-lmfbm-worker-a   1     10    22m
machineautoscaler-b   MachineSet   zhsungcp10-lmfbm-worker-b   0     9     22m
machineautoscaler-f   MachineSet   zhsungcp10-lmfbm-worker-f   0     9     22m

Add workload:
 Capping size to max cluster total size (20)
I1010 08:32:19.557095       1 scale_up.go:591] Splitting scale-up between 2 similar node groups: {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b, MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f}
I1010 08:32:19.557132       1 scale_up.go:601] Final scale-up plan: [{MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b 0->8 (max: 9)} {MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f 0->7 (max: 9)}]
I1010 08:32:19.557149       1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-b size to 8
I1010 08:32:20.161422       1 scale_up.go:700] Scale-up: setting group MachineSet/openshift-machine-api/zhsungcp10-lmfbm-worker-f size to 7

$ oc get machineset                                                                   
NAME                        DESIRED   CURRENT   READY   AVAILABLE   AGE
zhsungcp10-lmfbm-worker-a   1         1         1       1           123m
zhsungcp10-lmfbm-worker-b   8         8                             123m
zhsungcp10-lmfbm-worker-c   1         1         1       1           123m
zhsungcp10-lmfbm-worker-f   7         7                             123m

links to

scaling from 0 only scales one of the ASG's detected in spite of balance-similar-node-groups

Assignee:: Michael McCune

Reporter:: Zhaohua Sun

QA Contact:: Zhaohua Sun

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/10/12 7:57 AM

Updated:: 2024/05/23 1:53 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates