-
Feature
-
Resolution: Done
-
Undefined
-
None
-
None
Feature Overview
We want to allow 0.8.0 <= vLLM < 0.9.0 upstream to enable RHEL AI 1.5 to utilize vLLM 0.8.3 with CUDA accelerators. See relevant cards:
Goals
- vLLM 0.8.3 is the target vLLM version for RHEL AI 1.5 as defined here: https://docs.google.com/document/d/1rk5lgztANsY9SO4xYUwhT0K3Q0OSSHtntk5GBo0Gf4o/edit?usp=sharing
- We also have 2 CVEs reported upstream for our current vLLM==0.7.3, and the resolution for that CVE is to upgrade to >=0.8.0. See:
Requirements:
N/A - this is just tracking the upstream updates in order to enable our downstream processes to build wheels and containers against vllm==0.8.3. (The only exception is Intel Gaudi 3 accelerators, which must use vllm==0.6.6post1.)
Done - Acceptance Criteria:
- Replace the `vllm==0.7.3` version pin with `vllm>=0.8.0,<0.9.0` so that the latest Z-stream version of vLLM v0.8 can be theoretically be consumed for CUDA builds
- CI is green after updating the vLLM range (ensuring that the upstream bits are compatible with this new vLLM version)
- The 2 vLLM CVEs linked above are remediated
- is blocked by
-
RHELAI-3926 Add E2E Tests in "instructlab/instructlab" that set "use_dolomite=True" and test Llama 70B
-
- Closed
-