Loading...

XML

Word

Printable

Type: Story
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Accelerators - NVIDIA, InstructLab - Evaluation, InstructLab - SDG
Labels:
- 3.0-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goal:

As a user of RHEL AI, I want the best possible performance and throughput out of my inference server.

vLLM's v1 engine is a flagship feature of recent vLLM releases, but we hit a bug late in the RHEL AI 1.5 cycle (~~RHELAI-4084~~) that required us to disable vLLM v1 for Nvidia accelerators. That bug fix has been merged upstream at https://github.com/vllm-project/vllm/pull/17855 but as of creating this issue has not yet been released.

Once that gets into a vLLM release (likely 0.8.6 or later), we need to test reverting the swap to vLLM v0 (ie reverting https://gitlab.com/redhat/rhel-ai/containers/instructlab-nvidia/-/merge_requests/600) and ensure that inference and specifically SDG work with the v1 engine.

Acceptance Criteria:

The instructlab-nvidia container does not disable vLLM v1 via an environment variable.
Serving models via `ilab model serve` uses the vLLM v1 engine and works with our supported models (granite variants, mixtral w/ adapters, prometheus).
`ilab data generate` completes successfully in the vLLM v1 engine with our default agentic pipeline that runs against the mixtral teacher model with skills/knowledge adapters

Assignee:: Unassigned

Reporter:: Ben Browning

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/05/16 11:49 AM

Updated:: 2025/10/29 2:00 PM

Resolved:: 2025/10/29 2:00 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates