Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: rhelai-1.5
Affects Version/s: rhelai-1.4.3
Component/s: Development Platform
Labels:
- amd
- vllm

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Type:
Known Issue
Intelligence Requested:
Market:

Sprint:
AIPCC Sprint 2, AIPCC Application Platform 3

Release Blocker:
Approved

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

Deploy RHEL AI 1.4.3 onto Azure
Prepare the system (ilab config init, download models)
Run ilab data generate
Observe assert isinstance(module, BaseLayerWithLoRA) vllm traceback

Expected behavior

Successful SDG

Device Info (please complete the following information):

Hardware Specs: Standard_ND96asr_v4 (8*MI300X)
OS Version: RHEL AI 1.4.3
InstructLab Version: 0.23.3
Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image :
- "registry.stage.redhat.io/rhelai1/bootc-azure-amd-rhel9:1.4.3-1741712118"
- ilab system info :

Platform:
  sys.version: 3.11.7 (main, Jan  8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
  sys.platform: linux
  os.name: posix
  platform.release: 5.14.0-427.55.1.el9_4.x86_64
  platform.machine: x86_64
  platform.node: fzatlouk-rhelai-1.3-amd-test-westus
  platform.python_version: 3.11.7
  os-release.ID: rhel
  os-release.VERSION_ID: 9.4
  os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
  memory.total: 1820.96 GB
  memory.available: 1784.80 GB
  memory.used: 29.04 GB
InstructLab:
  instructlab.version: 0.23.3
  instructlab-dolomite.version: 0.2.0
  instructlab-eval.version: 0.5.1
  instructlab-quantize.version: 0.1.0
  instructlab-schema.version: 0.4.2
  instructlab-sdg.version: 0.7.1
  instructlab-training.version: 0.7.0
Torch:
  torch.version: 2.4.1
  torch.backends.cpu.capability: AVX512
  torch.version.cuda: None
  torch.version.hip: 6.2.41134-65d174c3e
  torch.cuda.available: True
  torch.backends.cuda.is_built: True
  torch.backends.mps.is_built: False
  torch.backends.mps.is_available: False
  torch.cuda.bf16: True
  torch.cuda.current.device: 0
  torch.cuda.0.name: AMD Radeon Graphics
  torch.cuda.0.free: 191.0 GB
  torch.cuda.0.total: 191.5 GB
  torch.cuda.0.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.1.name: AMD Radeon Graphics
  torch.cuda.1.free: 191.0 GB
  torch.cuda.1.total: 191.5 GB
  torch.cuda.1.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.2.name: AMD Radeon Graphics
  torch.cuda.2.free: 191.0 GB
  torch.cuda.2.total: 191.5 GB
  torch.cuda.2.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.3.name: AMD Radeon Graphics
  torch.cuda.3.free: 191.0 GB
  torch.cuda.3.total: 191.5 GB
  torch.cuda.3.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.4.name: AMD Radeon Graphics
  torch.cuda.4.free: 191.0 GB
  torch.cuda.4.total: 191.5 GB
  torch.cuda.4.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.5.name: AMD Radeon Graphics
  torch.cuda.5.free: 191.0 GB
  torch.cuda.5.total: 191.5 GB
  torch.cuda.5.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.6.name: AMD Radeon Graphics
  torch.cuda.6.free: 191.0 GB
  torch.cuda.6.total: 191.5 GB
  torch.cuda.6.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.7.name: AMD Radeon Graphics
  torch.cuda.7.free: 191.0 GB
  torch.cuda.7.total: 191.5 GB
  torch.cuda.7.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
llama_cpp_python:
  llama_cpp_python.version: 0.3.2
  llama_cpp_python.supports_gpu_offload: False

Bug impact

SDG can't be ran

Known workaround

N/A

Additional context

ilab model serve and ilab model chat works just fine.

The initial failure seems to be (full log attached):

ERROR 03-12 17:29:29 engine.py:366] 
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine
    engine = MQLLMEngine.from_engine_args(engine_args=engine_args,
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args
    return cls(ipc_path=ipc_path,
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__
    self.engine = LLMEngine(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__
    self.model_executor = executor_class(vllm_config=vllm_config, )
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__
    self._init_executor()
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 83, in _init_executor
    self._run_workers("load_model",
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 157, in _run_workers
    driver_worker_output = driver_worker_method(*args, **kwargs)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
    self.model_runner.load_model()
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 1124, in load_model
    self.model = self.lora_manager.create_lora_manager(self.model)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/worker_manager.py", line 174, in create_lora_manager
    lora_manager = create_lora_manager(
                   ^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 755, in create_lora_manager
    lora_manager = lora_manager_cls(
                   ^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 678, in __init__
    super().__init__(model, max_num_seqs, max_num_batched_tokens,
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 353, in __init__
    self._create_lora_modules()
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 507, in _create_lora_modules
    self.register_module(module_name, new_module)
  File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 513, in register_module
    assert isinstance(module, BaseLayerWithLoRA)
AssertionError
Process SpawnProcess-1

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

ilab_data_generate_amd.log
45 kB
2025/03/12 5:40 PM
ilab_cfg.md
20 kB
2025/03/12 8:59 PM
sdg_abort.log
503 kB
2025/03/13 9:42 PM

blocks

AIPCC-979 AMD GPU - Associated changes for vLLM 0.8.z

Closed

is blocked by

AIPCC-1379 Update ROCm vLLM support to 0.8.z

Closed

is cloned by

AIPCC-1498 RHEL AI 1.5 - vLLM fails to start on during training when using a separate data disk

Review

is duplicated by

RHELAI-3659 Failing to start vllm with mixtral on AMD

Closed

mentioned on

Merge request - AIPCC-654: bump release date of 1.4.3

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty