- What is the nature and description of the request?
Today automation jobs are not backed by controller objects[1] which prevent scale-down events[2] in cluster autoscaler.
- Why does the customer need this? (List the business requirements here)
The customer would like to leverage cluster autoscaler to autoscale worker nodes needed to run large amounts of ansible automation jobs. Instead of running a set number of worker nodes and having jobs sit in a pending state when running at full capacity. Today, this works fine for scale-up events but the issue arises when cluster autoscaler attempts to scale-down. Due to the way cluster autoscaler works it will not taint the node if the pod is not backed by a controller object allowing new job pods to be placed on the node even if other node's have capacity to run them.
- How would you like to achieve this? (List the functional requirements here)
Add the ability to change automation jobs from pod specs to a controller backed object (such as a kubernetes job)[1]. Then in combination with the"cluster-autoscaler.kubernetes.io/safe-to-evict": "false" annotation jobs would not be removed and continue to run while allowing cluster autoscaler to taint the worker node for removal and preventing new job pods from being placed on it.
- List any affected known dependencies: Doc, UI etc..
Unknown
- Github Link if any
N/A
[1] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
[2] https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#how-does-scale-down-work