Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: rhelai-1.5
Affects Version/s: rhelai-1.5
Component/s: vLLM
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Critical

Release Blocker:
Approved
Target Version:

rhelai-1.5

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

ilab data generate

OR from ilab shell from a RHEL AI 1.5 compose (to get traceback):

/opt/app-root/bin/python3.11 -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 56489 --model /var/home/azureuser/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 --distributed-executor-backend mp --served-model-name /var/home/azureuser/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 mixtral-8x7b-instruct-v0-1 models/granite-3-1-8b-lab-v2 models/granite-3-1-8b-starter-v2 models/mixtral-8x7b-instruct-v0-1 models/prometheus-8x7b-v2-0 --max-num-seqs 512 --enable-lora --enable-prefix-caching --max-lora-rank 64 --dtype bfloat16 --lora-dtype bfloat16 --fully-sharded-loras --lora-modules skill-classifier-v3-clm=/var/home/azureuser/.cache/instructlab/models/skills-adapter-v3 text-classifier-knowledge-v3-clm=/var/home/azureuser/.cache/instructlab/models/knowledge-adapter-v3 --tensor-parallel-size 1

Expected behavior

vLLM works and starts when configured with SDG parameters.

Screenshots

Attached Image

Device Info (please complete the following information):

Hardware Specs: MI 300X (verification on other accelerators pending)
OS Version: RHEL AI 1.5
InstructLab Version: 0.26
Provide the output of these two commands:

Bug impact

SDG not working, training not-verified.
The actual traceback is visible: https://issues.redhat.com/browse/RHELAI-4055?focusedId=27115309&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-27115309

Known workaround

Please add any known workarounds.

Additional context

This is reported upstream as https://github.com/vllm-project/vllm/issues/16676
and fixed by: https://github.com/vllm-project/vllm/pull/17671
Verified on AMD only for now: https://issues.redhat.com/browse/RHELAI-4055

blocks

AIPCC-979 AMD GPU - Associated changes for vLLM 0.8.z

Closed

is blocked by

RHELAI-4086 Update support to vLLM 0.8.z to pull in LoRA fix

Closed

Assignee:: Joseph Groenenboom

Reporter:: František Zatloukal

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/05/05 7:32 PM

Updated:: 2025/10/15 1:57 PM

Resolved:: 2025/05/12 2:42 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates