Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Supporting RHAI
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Request based autoscaling is not ideal for LLMs because they don't have a regular correlation between the number of requests and the resource utilization. Thus, concurrency is not a good option for autoscaling.
User needs to autoscale AI workloads based on other metrics such as throughput and latency. Use cases reported at the K8s Serving WG:
https://docs.google.com/document/d/1IFsCwWtIGMujaZZqEMR4ZYeZBi7Hb1ptfImCa1fFf1A/edit?resourcekey=0-8lD1pc_wDVxiwyI8SIhBCw#heading=h.msa1v1j90u
Metrics doc from the same WG: https://docs.google.com/document/d/1SpSp1E6moa4HSrJnS4x3NpLuj88sMXr2tbofKlzTZpk/edit?resourcekey=0-ob5dR-AJxLQ5SvPlA4rdsg#heading=h.qmzyorj64um1

This is also a request that was reported by the RHAI field folks.
This issue is about KServe and Serving integration (serverless mode), adding support for custom metrics via KEDA.
At the KServe side there is already a PR to support KEDA integration with raw deployments mode.

depends on

SRVKS-1224 Support custom metrics with the Keda HPA based autoscaler

Backlog

Assignee:: Unassigned

Reporter:: Stavros Kontopoulos (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/06/20 9:32 AM

Updated:: 2024/07/11 1:29 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide