XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: RHODS
Labels:
- AI/ML

Epic Name:
Performance and Scale testing for RHOAI releases with KServe stack
Workstream:

Inference, RHOAI
Color Status:
Not Selected
Ready:
False
Blocked:
False
Blocked Reason:
None

Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

OCP/Telco Definition of Done
Epic Template descriptions and documentation.

<--- Cut-n-Paste the entire contents of this description into your new Epic --->

Epic Goal

Performance
- Run workloads on IBM pre-trained models and the curated huggingface open source models to get the throughput and latency numbers
- Make sure that the performance numbers meet the GA requirements
- Test with GPU sharing techniques specifically MIG and to come up with a best practices guide on using MIG with watsonx models
Scalability
- Make sure that the stack is scalable as the load is increased
- test the robustness of the stack at high scale
Establish a performance and scale tuning guide for the serving stack
Socialize the results

Why is this important?

Ensure performance and scalability of the model serving stack

Scenarios

model performance on a single GPUs
model performance on multiple GPUs
model serving stack scalability across multiple GPU nodes

Acceptance Criteria

Test automation in ci-artifacts
Regression analysis in Horreum
Published tuning and scale guide including MIG
Blog post(s) for socializing the results

Dependencies (internal and external)

Availability of builds with the stack from IBM And OpenShfit AI eng teams

Previous Work (Optional):

Ansible Lightspeed performance

Open questions::

What are the performance requirements?
What all platforms we need to test - ROSA, ROKS, on-prem?
What all CPUs/GPUs need to be tested?

relates to

PSAP-1116 P&S of watsonx prompt tuned model serving

Closed

Assignee:: David Gray

Reporter:: Ashish Kamra

Contributors:: Kevin Pouget

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Due:: 2023/09/30

Created:: 2023/07/07 12:41 PM

Updated:: 2024/11/11 9:38 PM

Resolved:: 2024/11/11 9:38 PM