-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.15
-
None
-
Moderate
-
Yes
-
False
-
Description of problem:
Autoscaler cannot scale down the nodegroup that has Failed machine when maxNodeProvisionTime is reached
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-11-16-173006 This case works well for 4.14, and before I tested on 4.15.0-0.nightly-2023-10-09-101435, the case passed
How reproducible:
Always
Steps to Reproduce:
1.Create a machineset, replicas=0, instanceType is invalid liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws17a-h6mv8-worker-us-east-2a -oyaml>ms1.yaml liuhuali@Lius-MacBook-Pro huali-test % vim ms1.yaml liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml machineset.machine.openshift.io/huliu-aws17a-h6mv8-worker-us-east-2aa created liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws17a-h6mv8-master-0 Running m6i.xlarge us-east-2 us-east-2a 119m huliu-aws17a-h6mv8-master-1 Running m6i.xlarge us-east-2 us-east-2b 119m huliu-aws17a-h6mv8-master-2 Running m6i.xlarge us-east-2 us-east-2c 119m huliu-aws17a-h6mv8-worker-us-east-2a-vwgfx Running m6i.xlarge us-east-2 us-east-2a 115m huliu-aws17a-h6mv8-worker-us-east-2b-d88qr Running m6i.xlarge us-east-2 us-east-2b 115m huliu-aws17a-h6mv8-worker-us-east-2c-xmnbg Running m6i.xlarge us-east-2 us-east-2c 115m liuhuali@Lius-MacBook-Pro huali-test % oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE huliu-aws17a-h6mv8-worker-us-east-2a 1 1 1 1 119m huliu-aws17a-h6mv8-worker-us-east-2aa 0 0 16s huliu-aws17a-h6mv8-worker-us-east-2b 1 1 1 1 119m huliu-aws17a-h6mv8-worker-us-east-2c 1 1 1 1 119m 2.Create clusterautoscaler, machineautoscaler and workload liuhuali@Lius-MacBook-Pro huali-test % oc create -f clusterautoscale.yaml clusterautoscaler.autoscaling.openshift.io/default created liuhuali@Lius-MacBook-Pro huali-test % oc create -f machineautoscaler.yaml machineautoscaler.autoscaling.openshift.io/machineautoscaler-test2 created liuhuali@Lius-MacBook-Pro huali-test % oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler-test2 MachineSet huliu-aws17a-h6mv8-worker-us-east-2aa 0 2 11s liuhuali@Lius-MacBook-Pro huali-test % oc create -f workloadauto.yaml job.batch/workload created 3.The machines are scale up but not scale down liuhuali@Lius-MacBook-Pro huali-test % oc get machineset NAME DESIRED CURRENT READY AVAILABLE AGE huliu-aws17a-h6mv8-worker-us-east-2a 1 1 1 1 122m huliu-aws17a-h6mv8-worker-us-east-2aa 2 2 3m38s huliu-aws17a-h6mv8-worker-us-east-2b 1 1 1 1 122m huliu-aws17a-h6mv8-worker-us-east-2c 1 1 1 1 122m liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws17a-h6mv8-master-0 Running m6i.xlarge us-east-2 us-east-2a 122m huliu-aws17a-h6mv8-master-1 Running m6i.xlarge us-east-2 us-east-2b 122m huliu-aws17a-h6mv8-master-2 Running m6i.xlarge us-east-2 us-east-2c 122m huliu-aws17a-h6mv8-worker-us-east-2a-vwgfx Running m6i.xlarge us-east-2 us-east-2a 118m huliu-aws17a-h6mv8-worker-us-east-2aa-kpds5 Failed 5s huliu-aws17a-h6mv8-worker-us-east-2aa-zt4bn Failed 5s huliu-aws17a-h6mv8-worker-us-east-2b-d88qr Running m6i.xlarge us-east-2 us-east-2b 118m huliu-aws17a-h6mv8-worker-us-east-2c-xmnbg Running m6i.xlarge us-east-2 us-east-2c 118m liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws17a-h6mv8-master-0 Running m6i.xlarge us-east-2 us-east-2a 3h45m huliu-aws17a-h6mv8-master-1 Running m6i.xlarge us-east-2 us-east-2b 3h45m huliu-aws17a-h6mv8-master-2 Running m6i.xlarge us-east-2 us-east-2c 3h45m huliu-aws17a-h6mv8-worker-us-east-2a-vwgfx Running m6i.xlarge us-east-2 us-east-2a 3h40m huliu-aws17a-h6mv8-worker-us-east-2aa-kpds5 Failed 102m huliu-aws17a-h6mv8-worker-us-east-2aa-zt4bn Failed 102m huliu-aws17a-h6mv8-worker-us-east-2b-d88qr Running m6i.xlarge us-east-2 us-east-2b 3h40m huliu-aws17a-h6mv8-worker-us-east-2c-xmnbg Running m6i.xlarge us-east-2 us-east-2c 3h40m liuhuali@Lius-MacBook-Pro huali-test %
Actual results:
Autoscaler cannot scale down the nodegroup that has Failed machine when maxNodeProvisionTime is reached
Expected results:
Autoscaler will scale down the nodegroup that has Failed machine when maxNodeProvisionTime is reached
Additional info:
You can also follow the automation steps here https://github.com/openshift/openshift-tests-private/blob/master/test/extended/clusterinfrastructure/autoscaler.go#L324-L389 Found this when run CAO regression for https://issues.redhat.com/browse/OCPCLOUD-2137 must gather: https://drive.google.com/file/d/1TQJArXVH6mbplNULzxSLJG8ue3GZ0LtO/view?usp=sharing