-
Initiative
-
Resolution: Duplicate
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
Extend the JBenchmark system to support structured benchmarking across multiple llm-d configuration scenarios. This task aims to evaluate how different llm-d runtime settings affect model performance, stability, and resource efficiency when integrated with GuideLLM.
This includes the ability to:
- Benchmark under different router configurations (e.g., batching strategy, latency targets)
- Evaluate various Placement/Dispatch (P/D) strategies (e.g., static vs. dynamic node selection, GPU/resource awareness)
- Run over different GuideLLM datasets to simulate a variety of enterprise use cases
- Capture rich metadata for every benchmark run to enable reproducible comparisons
Examples of Configurable Parameters:
- Router strategies (e.g., round-robin, latency-aware)
- Batching configurations and max concurrency
- P/D policies and node-affinity rules
- Dataset variability (prompt types, lengths, formats)
Acceptance Criteria:
- Benchmark runs can be parameterized with different llm-d settings
- All runs are tagged with full config metadata
- Results are stored, queryable, and comparable via the benchmarking dashboard
- Significant configuration impacts are highlighted in reports for GuideLLM/llm-d stakeholders