Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1116

P&S of watsonx prompt tuned model serving

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • RHODS
    • Performance and scalability of watsonx prompt tuning
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 0
    • 0% 0%

      Epic Goal

      • Test the performance and scalability of prompt tuned models served via the watsonx stack
      • Ensure stability with many (thousands) of users each sending queries to prompt tuned model instances

      Why is this important?

      • As raised by Daniele, depending on the architecture, prompt tuning may result in large numbers of CRs / models / Pods... so the scalability of the architecture and the relevant controllers must be tested. We should also keep an eye on control plane load.

      Scenarios

      1. ...

      Acceptance Criteria

      •  
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      1. When will this feature be enabled, and how can it be used / tested
      2. What are the requirements / expectations in terms of # users, namespaces, models / requests per minute

            Unassigned Unassigned
            dagray@redhat.com David Gray
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: