Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: rhelai-1.4
Affects Version/s: rhelai-1.4
Component/s: Accelerators - Intel Gaudi, Containers, Fromager, vLLM
Labels:
- Gaudi
- Intel

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Important

Release Blocker:
Approved

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

Jira won't allow below list to be numbered correctly. Please take this as a numbered list.

podman pull registry.stage.redhat.io/rhelai1/instructlab-intel-rhel9:1.4-1738240991

podman run -it --hooks-dir /tmp --device /dev/accel --device /dev/infiniband instructlab-intel-rhel9:1.4-1738240991 /bin/bash

(app-root)$ PT_HPU_LAZY_MODE=1 PT_HPU_ENABLE_LAZY_COLLECTIVES=true vllm serve instructlab/granite-7b-lab --dtype bfloat16 --distributed-executor-backend mp -tp 4

(app-root)$ vllm chat # in a separate tmux or something

Expected behavior

vLLM server should begin boot process, go through its warmup, and start serving model.
User should be able to chat with model without it failing.

Device Info (please complete the following information):

Hardware Specs:
- 8xGaudi 3
OS Version: Ubuntu 22.04
InstructLab Version: 0.23.1
Machine was returned to SMC before hardware dump could be recorded for this bug report.

Bug impact

Inference via vLLM is not functioning in this image. Users will not be able to serve models from their Gaudi 3 machines via vLLM.

Known workaround

None

Additional context

The version of vLLM present in this image should be the released version v0.6.4 plus a commit that enables the MP backend for distributed inference, using >1 Gaudi 3 device.
In the TP=1 case, vLLM seems to behave correctly. The attached logs reflect that.
In the TP=2 case, vLLM boots, but cannot handle an incoming request.
In the TP=4 case, vLLM cannot boot.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

vllm-tp1-granite7blab.txt
2025/01/31 7:59 PM
77 kB
James Kunstle
vllm-tp2-granite7blab.txt
2025/01/31 7:59 PM
268 kB
James Kunstle
vllm-tp4-granite7blab.txt
2025/01/31 7:59 PM
1.64 MB
James Kunstle

Assignee:: Frank Jansen

Reporter:: James Kunstle (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2025/01/31 8:00 PM

Updated:: 2025/02/04 11:27 AM

Resolved:: 2025/02/04 11:27 AM