Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1244

Custom metrics support

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • False
    • None
    • False

      Request based autoscaling is not ideal for LLMs because they don't have a regular correlation between the number of requests and the resource utilization. Thus, concurrency is not a good option for autoscaling.
      User needs to autoscale AI workloads based on other metrics such as throughput and latency. Use cases reported at the K8s Serving WG:
      https://docs.google.com/document/d/1IFsCwWtIGMujaZZqEMR4ZYeZBi7Hb1ptfImCa1fFf1A/edit?resourcekey=0-8lD1pc_wDVxiwyI8SIhBCw#heading=h.msa1v1j90u
      Metrics doc from the same WG: https://docs.google.com/document/d/1SpSp1E6moa4HSrJnS4x3NpLuj88sMXr2tbofKlzTZpk/edit?resourcekey=0-ob5dR-AJxLQ5SvPlA4rdsg#heading=h.qmzyorj64um1

      This is also a request that was reported by the RHAI field folks.
      This issue is about KServe and Serving integration (serverless mode), adding support for custom metrics via KEDA.
      At the KServe side there is already a PR to support KEDA integration with raw deployments mode.

              Unassigned Unassigned
              skontopo@redhat.com Stavros Kontopoulos
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: