OCP/Telco Definition of Done
Epic Template descriptions and documentation.
<--- Cut-n-Paste the entire contents of this description into your new Epic --->
Epic Goal
- Performance
- Run workloads on IBM pre-trained models and the curated huggingface open source models to get the throughput and latency numbers
- Make sure that the performance numbers meet the GA requirements
- Test with GPU sharing techniques specifically MIG and to come up with a best practices guide on using MIG with watsonx models
- Scalability
- Make sure that the stack is scalable as the load is increased
- test the robustness of the stack at high scale
- Establish a performance and scale tuning guide for the serving stack
- Socialize the results
Why is this important?
- Ensure performance and scalability of the model serving stack
Scenarios
- model performance on a single GPUs
- model performance on multiple GPUs
- model serving stack scalability across multiple GPU nodes
Acceptance Criteria
- Test automation in ci-artifacts
- Regression analysis in Horreum
- Published tuning and scale guide including MIG
- Blog post(s) for socializing the results
Dependencies (internal and external)
- Availability of builds with the stack from IBM And OpenShfit AI eng teams
Previous Work (Optional):
- Ansible Lightspeed performance
Open questions::
- What are the performance requirements?
- What all platforms we need to test - ROSA, ROKS, on-prem?
- What all CPUs/GPUs need to be tested?
- relates to
-
PSAP-1116 P&S of watsonx prompt tuned model serving
- Closed