-
Bug
-
Resolution: Done
-
Critical
-
rhelai-1.4.3
-
False
-
-
False
-
Known Issue
-
-
-
AIPCC Sprint 2, AIPCC Application Platform 3
-
Approved
To Reproduce Steps to reproduce the behavior:
- Deploy RHEL AI 1.4.3 onto Azure
- Prepare the system (ilab config init, download models)
- Run ilab data generate
- Observe assert isinstance(module, BaseLayerWithLoRA) vllm traceback
Expected behavior
- Successful SDG
Device Info (please complete the following information):
- Hardware Specs: Standard_ND96asr_v4 (8*MI300X)
- OS Version: RHEL AI 1.4.3
- InstructLab Version: 0.23.3
- Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image :
- "registry.stage.redhat.io/rhelai1/bootc-azure-amd-rhel9:1.4.3-1741712118"
- ilab system info :
Platform: sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] sys.platform: linux os.name: posix platform.release: 5.14.0-427.55.1.el9_4.x86_64 platform.machine: x86_64 platform.node: fzatlouk-rhelai-1.3-amd-test-westus platform.python_version: 3.11.7 os-release.ID: rhel os-release.VERSION_ID: 9.4 os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow) memory.total: 1820.96 GB memory.available: 1784.80 GB memory.used: 29.04 GB InstructLab: instructlab.version: 0.23.3 instructlab-dolomite.version: 0.2.0 instructlab-eval.version: 0.5.1 instructlab-quantize.version: 0.1.0 instructlab-schema.version: 0.4.2 instructlab-sdg.version: 0.7.1 instructlab-training.version: 0.7.0 Torch: torch.version: 2.4.1 torch.backends.cpu.capability: AVX512 torch.version.cuda: None torch.version.hip: 6.2.41134-65d174c3e torch.cuda.available: True torch.backends.cuda.is_built: True torch.backends.mps.is_built: False torch.backends.mps.is_available: False torch.cuda.bf16: True torch.cuda.current.device: 0 torch.cuda.0.name: AMD Radeon Graphics torch.cuda.0.free: 191.0 GB torch.cuda.0.total: 191.5 GB torch.cuda.0.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.1.name: AMD Radeon Graphics torch.cuda.1.free: 191.0 GB torch.cuda.1.total: 191.5 GB torch.cuda.1.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.2.name: AMD Radeon Graphics torch.cuda.2.free: 191.0 GB torch.cuda.2.total: 191.5 GB torch.cuda.2.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.3.name: AMD Radeon Graphics torch.cuda.3.free: 191.0 GB torch.cuda.3.total: 191.5 GB torch.cuda.3.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.4.name: AMD Radeon Graphics torch.cuda.4.free: 191.0 GB torch.cuda.4.total: 191.5 GB torch.cuda.4.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.5.name: AMD Radeon Graphics torch.cuda.5.free: 191.0 GB torch.cuda.5.total: 191.5 GB torch.cuda.5.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.6.name: AMD Radeon Graphics torch.cuda.6.free: 191.0 GB torch.cuda.6.total: 191.5 GB torch.cuda.6.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.7.name: AMD Radeon Graphics torch.cuda.7.free: 191.0 GB torch.cuda.7.total: 191.5 GB torch.cuda.7.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) llama_cpp_python: llama_cpp_python.version: 0.3.2 llama_cpp_python.supports_gpu_offload: False
Bug impact
- SDG can't be ran
Known workaround
- N/A
Additional context
ilab model serve and ilab model chat works just fine.
The initial failure seems to be (full log attached):
ERROR 03-12 17:29:29 engine.py:366] Traceback (most recent call last): File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ super().__init__(*args, **kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__ self._init_executor() File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 83, in _init_executor self._run_workers("load_model", File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 157, in _run_workers driver_worker_output = driver_worker_method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 1124, in load_model self.model = self.lora_manager.create_lora_manager(self.model) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/worker_manager.py", line 174, in create_lora_manager lora_manager = create_lora_manager( ^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 755, in create_lora_manager lora_manager = lora_manager_cls( ^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 678, in __init__ super().__init__(model, max_num_seqs, max_num_batched_tokens, File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 353, in __init__ self._create_lora_modules() File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 507, in _create_lora_modules self.register_module(module_name, new_module) File "/opt/app-root/lib64/python3.11/site-packages/vllm/lora/models.py", line 513, in register_module assert isinstance(module, BaseLayerWithLoRA) AssertionError Process SpawnProcess-1
- blocks
-
AIPCC-979 AMD GPU - Associated changes for vLLM 0.8.z
-
- Closed
-
- is blocked by
-
AIPCC-1379 Update ROCm vLLM support to 0.8.z
-
- Closed
-
- is cloned by
-
AIPCC-1498 RHEL AI 1.5 - vLLM fails to start on during training when using a separate data disk
-
- Review
-
- is duplicated by
-
RHELAI-3659 Failing to start vllm with mixtral on AMD
-
- Closed
-
- mentioned on