Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: ai-ml-workloads
Labels:
- Kueue

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Allow to configure resource requests and limit for Kueue's controller manager

2. What is the nature and description of the request?

Allow users to customize the resource requirements of Kueue's controller manager when deploying using the operator.

3. Why does the customer need this? (List the business requirements here)

Different customer will have different amount of Workloads, ClusterQueues, LocalQueues etc... those resources load Kueue's controller manager, thus when more of them exist the controller manages needs more resources (memory/cpu). Without it the its pod may crash (OOM) or throttled.

The current request and limits are:

        resources:
          limits:
            cpu: "2"
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi// code placeholder

In one of our clusters (Konflux) the controller crashed because OOM. We had ~500 localqueues (in other cluster we can have more) and ~30 workloads.

I'm adding a screenshot of the memory usage (after I patched the deployment manually and scaled the operator to 0 so it won't override my change)

As a side note, upstream Kueue allows to configure the resources when installing using the Helm chart.

4. List any affected packages or components.

kueue controller manager (AI/ML workloads)

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2025-06-12-21-42-48-587.png
2025/06/12 6:42 PM
112 kB
Gal Ben Haim

Assignee:: Duncan Hardie

Reporter:: Gal Ben Haim

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/06/12 6:45 PM

Updated:: 2025/10/23 7:24 PM

Resolved:: 2025/07/08 8:38 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates