Epic Goal
- Test the performance and scalability of prompt tuned models served via the watsonx stack
- Ensure stability with many (thousands) of users each sending queries to prompt tuned model instances
Why is this important?
- As raised by Daniele, depending on the architecture, prompt tuning may result in large numbers of CRs / models / Pods... so the scalability of the architecture and the relevant controllers must be tested. We should also keep an eye on control plane load.
Scenarios
- ...
Acceptance Criteria
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- When will this feature be enabled, and how can it be used / tested
- What are the requirements / expectations in terms of # users, namespaces, models / requests per minute
- is related to
-
PSAP-1112 Performance and Scale testing for RHOAI releases with KServe stack
- Closed