Details
-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.11.z, 4.10.z
-
None
-
Moderate
-
False
-
Description
Description of problem:
When using the ClusterAPI CloudProvider, reconciliation fails for scaling requests when values are above the `max` or below the `min` of the machineautoscaler. [0][1][5]
Version-Release number of selected component (if applicable):
OpenShift 4.10.26
How reproducible:
Everytime, using the examples provided.
Steps to Reproduce:
1. Deploy cluster in Azure 2. Scale the MachineSet to high (example set to 15) 3. Add the ClusterAutoscaler and MachineAutoscaler (examples provided) 4. Wait for Autoscaler to attempt to shrink the cluster
Actual results:
ClusterAutoscaler tags Nodes for removal and then displays an error. The autoscaler removes the tags and loops with a backoff.
Expected results:
Cluster will attempt to scale down the cluster
Additional info:
We see that the ClusterAPI provider triggers the 'SetSize()' function inside the `DeleteNodes()`[2]. The current replicas in the `SetSize()` function is not correctly inspected and can fall both above or below the desired range. [3] This should not cause the reconciliation to fail when it is heading towards the desired outcome. This does not appear to be resolved in the current version of the Autoscaler. Resources: [0] Shwoing the ClusterAPI is inuse: ~~~ inspect.local.8845144723344453535|⇒ cat namespaces/openshift-machine-api/pods/cluster-autoscaler-default-55b89484b8-l4hmg/cluster-autoscaler/cluster-autoscaler/logs/current.log | grep cloud_provider_builder.go:29 2022-12-06T05:36:26.946717757Z I1206 05:36:26.946666 1 cloud_provider_builder.go:29] Building clusterapi cloud provider. ~~~ [1] https://github.com/openshift/kubernetes-autoscaler/blob/012b9608f/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go#L29 [2] https://github.com/openshift/kubernetes-autoscaler/blob/012b9608f/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_nodegroup.go/#L157-L161 [3] https://github.com/openshift/kubernetes-autoscaler/blob/012b9608f/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go/#L101-L132 [4] https://github.com/openshift/kubernetes-autoscaler/blob/master/cluster-autoscaler/cloudprovider/clusterapi/clusterapi_unstructured.go/#L106-L112 [5] The error which is displayed: ~~~ rg too 1571:2022-12-06T06:13:57.591375632Z E1206 06:13:57.591325 1 scale_down.go:1146] Problem with empty node deletion: failed to delete mwasher-03374550-7sml7-worker-australiasoutheast-5wg9j: size increase too large - desired:14 max:1 ~~~