-
Feature Request
-
Resolution: Won't Do
-
Normal
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Allow to configure resource requests and limit for Kueue's controller manager
2. What is the nature and description of the request?
Allow users to customize the resource requirements of Kueue's controller manager when deploying using the operator.
3. Why does the customer need this? (List the business requirements here)
Different customer will have different amount of Workloads, ClusterQueues, LocalQueues etc... those resources load Kueue's controller manager, thus when more of them exist the controller manages needs more resources (memory/cpu). Without it the its pod may crash (OOM) or throttled.
The current request and limits are:
resources: limits: cpu: "2" memory: 512Mi requests: cpu: 500m memory: 512Mi// code placeholder
In one of our clusters (Konflux) the controller crashed because OOM. We had ~500 localqueues (in other cluster we can have more) and ~30 workloads.
I'm adding a screenshot of the memory usage (after I patched the deployment manually and scaled the operator to 0 so it won't override my change)
As a side note, upstream Kueue allows to configure the resources when installing using the Helm chart.
4. List any affected packages or components.
kueue controller manager (AI/ML workloads)