Epic Goal

Improve our load testing tool llm-load-test and related automation to keep up with best practices / state of the art:

Use a dataset that is representative of a wider set of use cases – input/output tokens ranging from 0 to 4096 tokens, with option to configure the bounds for each test
Measure time to first token and time per output token (TTFT, TPOT)
Develop a load generator that can be used to test models in various runtimes with different interfaces (GRPC, HTTP)
Use mlcommons loadgen to drive load
Potentially integrate mlcommons loadgen into our load testing tool

Why is this important?

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>