-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
5
-
False
-
-
False
-
-
-
AUTOSCALE - Sprint 277
agarcial@redhat.com opened this card upstream https://github.com/kubernetes-sigs/karpenter/issues/2385 and the latest update on it is that we may need an RFC to push discussion further.
Seems like there's been a similar ask already here: https://github.com/kubernetes-sigs/karpenter/pull/1160 but the issue got stale.
This card documents what we need to do in order to progress this.The last time we looked at this, this was the karpenter working group notes for it: https://docs.google.com/document/d/18BT0AIMugpNpiSPJNlcAL2rv69yAE6Z06gUVj7v_clg/edit?tab=t.0#heading=h.e3tzux0nm96 and this was the recording for context: https://www.youtube.com/watch?v=RvDyF9TtmjY
I've actually tested out what was said in that meeting, and it's true, you can actually limit the number of nodes you provision with the NodePool.spec.limits.nodes field but it only limits per NodePool. So, if we were to straight up just use this "feature" as is, we would need to somehow figure out a way to limit the number of Nodes across multiple possible NodePools and including nodes that Karpenter doesn't even control.
Dod:
- Figure out if there's a way we can simply leverage the existing functionality here, and have hypershift code hack around it.
- This is the easiest way to avoid the upstream dance, but probably requires some hackery as technically the upstream nodePool nodes limit functionality is apparently an unintended feature.
- This might also have unintended consequences as you would also have to think about how consolidation works if you are at the limit.
- If we can't figure this out, we should propose some way to limit Nodes or NodeClaims globally, and not with the use of NodePools.
- There was some existing work already, but it got closed due to staleness: https://github.com/kubernetes-sigs/karpenter/pull/1151
- We would need either an RFC and/or a WIP PR is submitted to the kubernetes-sigs/karpenter repository detailing our actual requirements.
- Keep in mind that this is needs to solve our original problem for ROSA AutoNode. That being that
- It needs to be configurable by the service delivery admin and adjustable without disruption (e.g., container restart)
- Karpenter needs to take in ALL nodes (not just karpenter nodes) in the cluster when respecting this maximum.
- Keep in mind that this is needs to solve our original problem for ROSA AutoNode. That being that