-
Story
-
Resolution: Done
-
Major
-
None
-
None
this should follow our normal process for rebasing the autoscaler but there is a change that we will need to carry in the short term.
we recently merged the "scale from zero" feature into the upstream code. this means we can remove our carry patch for that feature, but we need to be careful when we do this rebase. when we took the feature to the upstream there were some changes that needed to be made, specifically around the annotations that are used to denote the machine resource capacities that will be created by a machineset. we will need to carry a patch in our autoscaler to ensure that the current operation continues as expected, and then we will create a more durable solution with the work that will come out of OCPCLOUD-1658.
for this rebase we need to look at the annotations for the GPU type and GPU count. currently, our implementation assumes that all GPUs will be of the nvidia.com/gpu variety, but the upstream implementation allows for the annotation to specify the type of GPU. we need to carry a patch that will default the type to "nvidia.com/gpu" until we have been able to make the necessary changes that come out of OCPCLOUD-1658.
the reasoning for adding this carry patch now is that in order to make the necessary changes to update the annotations in openshift we will need to update the machineset controllers for all the providers, and potentially the cluster-autoscaler-operator as well.
additionally, the annotation names are different in the upstream, reflecting an ownership by the cluster-autoscaler. we will need to change our annotations as well, but in the short term we should accept both annotations until we have been able to deprecate the currently used values.
- blocks
-
OCPCLOUD-1677 Update cluster autoscaler operator to use --record-duplicated-events flag
- Closed