-
Bug
-
Resolution: Done
-
Critical
-
None
-
rhelai-1.5.3
-
None
To Reproduce Steps to reproduce the behavior:
- Prepare RHEL AI 1.5.3 on nVidia instance (tested both on A100 and H100)
- Attempt ilab data generate
Expected behavior
- Successful SDG run
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: nVidia A100 (x8) or nVidia H100 (x8)
- OS Version: RHEL AI 1.5.3-2
- InstructLab Version: ilab, version 0.26.1
- Provide the output of these two commands:
- registry.stage.redhat.io/rhelai1/bootc-azure-nvidia-rhel9:1.5.3-1754022569
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 8 CUDA devices: Device 0: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 1: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 2: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 3: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 4: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 5: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 6: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Device 7: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes Platform: sys.version: 3.11.7 (main, Jun 25 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-4)] sys.platform: linux os.name: posix platform.release: 5.14.0-427.77.1.el9_4.x86_64 platform.machine: x86_64 platform.node: fzatlouk-rhelai-1.5-nvidia-test platform.python_version: 3.11.7 os-release.ID: rhel os-release.VERSION_ID: 9.4 os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow) memory.total: 885.80 GB memory.available: 877.41 GB memory.used: 4.46 GB InstructLab: instructlab.version: 0.26.1 instructlab-dolomite.version: 0.2.0 instructlab-eval.version: 0.5.1 instructlab-quantize.version: 0.1.0 instructlab-schema.version: 0.4.2 instructlab-sdg.version: 0.8.3 instructlab-training.version: 0.10.3 Torch: torch.version: 2.6.0 torch.backends.cpu.capability: AVX2 torch.version.cuda: 12.4 torch.version.hip: None torch.cuda.available: True torch.backends.cuda.is_built: True torch.backends.mps.is_built: False torch.backends.mps.is_available: False torch.cuda.bf16: True torch.cuda.current.device: 0 torch.cuda.0.name: NVIDIA A100-SXM4-40GB torch.cuda.0.free: 39.0 GB torch.cuda.0.total: 39.4 GB torch.cuda.0.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.1.name: NVIDIA A100-SXM4-40GB torch.cuda.1.free: 39.0 GB torch.cuda.1.total: 39.4 GB torch.cuda.1.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.2.name: NVIDIA A100-SXM4-40GB torch.cuda.2.free: 39.0 GB torch.cuda.2.total: 39.4 GB torch.cuda.2.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.3.name: NVIDIA A100-SXM4-40GB torch.cuda.3.free: 39.0 GB torch.cuda.3.total: 39.4 GB torch.cuda.3.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.4.name: NVIDIA A100-SXM4-40GB torch.cuda.4.free: 39.0 GB torch.cuda.4.total: 39.4 GB torch.cuda.4.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.5.name: NVIDIA A100-SXM4-40GB torch.cuda.5.free: 39.0 GB torch.cuda.5.total: 39.4 GB torch.cuda.5.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.6.name: NVIDIA A100-SXM4-40GB torch.cuda.6.free: 39.0 GB torch.cuda.6.total: 39.4 GB torch.cuda.6.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.7.name: NVIDIA A100-SXM4-40GB torch.cuda.7.free: 39.0 GB torch.cuda.7.total: 39.4 GB torch.cuda.7.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute) llama_cpp_python: llama_cpp_python.version: 0.3.6 llama_cpp_python.supports_gpu_offload: True
Bug impact
- SDG is not working
Known workaround
- N/A
Additional context
ilab chat and serve works just fine
First issue:
(VllmWorkerProcess pid=487) Message: 'Cannot use FlashAttention-2 backend for head size %d.' (VllmWorkerProcess pid=487) Arguments: (None,) (VllmWorkerProcess pid=487) INFO 08-01 10:42:42 [cuda.py:289] Using XFormers backend. (VllmWorkerProcess pid=488) --- Logging error --- (VllmWorkerProcess pid=488) Traceback (most recent call last): (VllmWorkerProcess pid=488) File "/usr/lib64/python3.11/logging/__init__.py", line 1110, in emit (VllmWorkerProcess pid=488) msg = self.format(record) (VllmWorkerProcess pid=488) ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=488) File "/usr/lib64/python3.11/logging/__init__.py", line 953, in format (VllmWorkerProcess pid=488) return fmt.format(record) (VllmWorkerProcess pid=488) ^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=488) File "/opt/app-root/lib64/python3.11/site-packages/vllm/logging_utils/formatter.py", line 13, in format (VllmWorkerProcess pid=488) msg = logging.Formatter.format(self, record) (VllmWorkerProcess pid=488) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=488) File "/usr/lib64/python3.11/logging/__init__.py", line 687, in format (VllmWorkerProcess pid=488) record.message = record.getMessage() (VllmWorkerProcess pid=488) ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=488) File "/usr/lib64/python3.11/logging/__init__.py", line 377, in getMessage (VllmWorkerProcess pid=488) msg = msg % self.args (VllmWorkerProcess pid=488) ~~~~^~~~~~~~~~~ (VllmWorkerProcess pid=488) TypeError: %d format: a real number is required, not NoneType
causes: INFO 08-01 10:42:42 [cuda.py:289] Using XFormers backend.
and subsequently, XFormers fails too:
INFO 2025-08-01 10:42:47,549 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:60237/v1, this might take a moment... Attempt: 15/1200 (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method load_model. (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Traceback (most recent call last): (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] output = run_method(worker, method, args, kwargs) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/utils.py", line 2378, in run_method (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] return func(*args, **kwargs) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.model_runner.load_model() (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 1113, in load_model (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.model = get_model(vllm_config=self.vllm_config) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] return loader.load_model(vllm_config=vllm_config) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] model = _initialize_model(vllm_config=vllm_config) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] return model_class(vllm_config=vllm_config, prefix=prefix) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 438, in __init__ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.model = MixtralModel(vllm_config=vllm_config, (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 276, in __init__ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.start_layer, self.end_layer, self.layers = make_layers( (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/utils.py", line 609, in make_layers (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] [PPMissingLayer() for _ in range(start_layer)] + [ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp> (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}")) (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 278, in <lambda> (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] lambda prefix: MixtralDecoderLayer( (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 205, in __init__ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.self_attn = MixtralAttention( (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 143, in __init__ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] self.q_size = self.num_heads * self.head_dim (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ (VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] TypeError: unsupported operand type(s) for *: 'int' and 'NoneType' (VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method load_model. (VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Traceback (most recent call last): (VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
- depends on
-
AIPCC-4556 Bump vllm tag for RHEL AI 1.5.z release
-
- Closed
-
- is blocked by
-
AIPCC-4046 test-accelerated-cuda-ubi9-{*}-bootstrap-and-onboard for Builder 14.2-maint fail
-
- Closed
-
- mentioned on
(9 mentioned on)