Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1116

P&S of watsonx prompt tuned model serving

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • None
    • RHODS
    • Performance and scalability of watsonx prompt tuning
    • Inference, RHOAI
    • Not Selected
    • False
    • False
    • None

      Epic Goal

      • Test the performance and scalability of prompt tuned models served via the watsonx stack
      • Ensure stability with many (thousands) of users each sending queries to prompt tuned model instances

      Why is this important?

      • As raised by Daniele, depending on the architecture, prompt tuning may result in large numbers of CRs / models / Pods... so the scalability of the architecture and the relevant controllers must be tested. We should also keep an eye on control plane load.

      Scenarios

      1. ...

      Acceptance Criteria

      •  
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      1. When will this feature be enabled, and how can it be used / tested
      2. What are the requirements / expectations in terms of # users, namespaces, models / requests per minute

              Unassigned Unassigned
              dagray@redhat.com David Gray
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: