-
Bug
-
Resolution: Done
-
Undefined
-
rhelai-1.5
-
None
To Reproduce Steps to reproduce the behavior:
ilab data generate
OR from ilab shell from a RHEL AI 1.5 compose (to get traceback):
/opt/app-root/bin/python3.11 -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 56489 --model /var/home/azureuser/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 --distributed-executor-backend mp --served-model-name /var/home/azureuser/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 mixtral-8x7b-instruct-v0-1 models/granite-3-1-8b-lab-v2 models/granite-3-1-8b-starter-v2 models/mixtral-8x7b-instruct-v0-1 models/prometheus-8x7b-v2-0 --max-num-seqs 512 --enable-lora --enable-prefix-caching --max-lora-rank 64 --dtype bfloat16 --lora-dtype bfloat16 --fully-sharded-loras --lora-modules skill-classifier-v3-clm=/var/home/azureuser/.cache/instructlab/models/skills-adapter-v3 text-classifier-knowledge-v3-clm=/var/home/azureuser/.cache/instructlab/models/knowledge-adapter-v3 --tensor-parallel-size 1
Expected behavior
- vLLM works and starts when configured with SDG parameters.
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: MI 300X (verification on other accelerators pending)
- OS Version: RHEL AI 1.5
- InstructLab Version: 0.26
- Provide the output of these two commands:
Bug impact
- SDG not working, training not-verified.
- The actual traceback is visible: https://issues.redhat.com/browse/RHELAI-4055?focusedId=27115309&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-27115309
Known workaround
- Please add any known workarounds.
Additional context
- This is reported upstream as https://github.com/vllm-project/vllm/issues/16676
- and fixed by: https://github.com/vllm-project/vllm/pull/17671
- Verified on AMD only for now: https://issues.redhat.com/browse/RHELAI-4055
- blocks
-
AIPCC-979 AMD GPU - Associated changes for vLLM 0.8.z
-
- Closed
-
- is blocked by
-
RHELAI-4086 Update support to vLLM 0.8.z to pull in LoRA fix
-
- Closed
-