-
Initiative
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
Description:{}
Enable performance benchmarking of a single deployed model across multiple versions of the vLLM inference engine. This capability is essential for evaluating engine version regressions, improvements, and compatibility under real-world load.
The system should allow users (e.g.,MLE) to:
- Define a list of vLLM versions to benchmark (e.g., v0.2.4, v0.3.1, v0.4.0, main)
- Run performance benchmarks against the same model using identical workload settings
- {}{}Goal: {}Provide clear, comparable performance metrics across vLLM versions to support upgrade decisions, regression detection, and engine tuning. This task complements multi-config testing and enables fine-grained engine evolution analysis.