-
Task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
3
-
False
-
None
-
False
-
Testable
-
No
-
No
-
No
-
Pending
-
None
-
-
-
ML Serving Sprint 1.29
HPA feature of modelmesh supports CPU/MEMORY based only. For better support, we need more metrics for autoscaler.
So we should research more options such as custom metrics or Custom Metrics Autoscaler Operator.
With this ticket, it is expected to research options.
slack thread: https://redhat-internal.slack.com/archives/C04FSLYLDQ8/p1685061377990309
David's idea is the following:
Here are my ideas that I know we can get from prometheus metrics: - average latency of modelmesh requests - average requests per minute - gpu utilization and others that may be possible, but I don't know all the modelmesh metrics currently available: - queue length at runtime or modelmesh request level
Reference: