Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: rhelai-1.5.3
Component/s: Accelerators - NVIDIA, InstructLab - SDG, vLLM
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Severity:
Critical

Release Blocker:
Approved
Target Version:

rhelai-1.5.3

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

Prepare RHEL AI 1.5.3 on nVidia instance (tested both on A100 and H100)
Attempt ilab data generate

Expected behavior

Successful SDG run

Screenshots

Attached Image

Device Info (please complete the following information):

Hardware Specs: nVidia A100 (x8) or nVidia H100 (x8)
OS Version: RHEL AI 1.5.3-2
InstructLab Version: ilab, version 0.26.1
Provide the output of these two commands:
- registry.stage.redhat.io/rhelai1/bootc-azure-nvidia-rhel9:1.5.3-1754022569

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 CUDA devices:
  Device 0: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 1: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 2: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 3: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 4: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 5: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 6: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
  Device 7: NVIDIA A100-SXM4-40GB, compute capability 8.0, VMM: yes
Platform:
  sys.version: 3.11.7 (main, Jun 25 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-4)]
  sys.platform: linux
  os.name: posix
  platform.release: 5.14.0-427.77.1.el9_4.x86_64
  platform.machine: x86_64
  platform.node: fzatlouk-rhelai-1.5-nvidia-test
  platform.python_version: 3.11.7
  os-release.ID: rhel
  os-release.VERSION_ID: 9.4
  os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
  memory.total: 885.80 GB
  memory.available: 877.41 GB
  memory.used: 4.46 GB
InstructLab:
  instructlab.version: 0.26.1
  instructlab-dolomite.version: 0.2.0
  instructlab-eval.version: 0.5.1
  instructlab-quantize.version: 0.1.0
  instructlab-schema.version: 0.4.2
  instructlab-sdg.version: 0.8.3
  instructlab-training.version: 0.10.3
Torch:
  torch.version: 2.6.0
  torch.backends.cpu.capability: AVX2
  torch.version.cuda: 12.4
  torch.version.hip: None
  torch.cuda.available: True
  torch.backends.cuda.is_built: True
  torch.backends.mps.is_built: False
  torch.backends.mps.is_available: False
  torch.cuda.bf16: True
  torch.cuda.current.device: 0
  torch.cuda.0.name: NVIDIA A100-SXM4-40GB
  torch.cuda.0.free: 39.0 GB
  torch.cuda.0.total: 39.4 GB
  torch.cuda.0.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.1.name: NVIDIA A100-SXM4-40GB
  torch.cuda.1.free: 39.0 GB
  torch.cuda.1.total: 39.4 GB
  torch.cuda.1.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.2.name: NVIDIA A100-SXM4-40GB
  torch.cuda.2.free: 39.0 GB
  torch.cuda.2.total: 39.4 GB
  torch.cuda.2.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.3.name: NVIDIA A100-SXM4-40GB
  torch.cuda.3.free: 39.0 GB
  torch.cuda.3.total: 39.4 GB
  torch.cuda.3.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.4.name: NVIDIA A100-SXM4-40GB
  torch.cuda.4.free: 39.0 GB
  torch.cuda.4.total: 39.4 GB
  torch.cuda.4.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.5.name: NVIDIA A100-SXM4-40GB
  torch.cuda.5.free: 39.0 GB
  torch.cuda.5.total: 39.4 GB
  torch.cuda.5.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.6.name: NVIDIA A100-SXM4-40GB
  torch.cuda.6.free: 39.0 GB
  torch.cuda.6.total: 39.4 GB
  torch.cuda.6.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.7.name: NVIDIA A100-SXM4-40GB
  torch.cuda.7.free: 39.0 GB
  torch.cuda.7.total: 39.4 GB
  torch.cuda.7.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
llama_cpp_python:
  llama_cpp_python.version: 0.3.6
  llama_cpp_python.supports_gpu_offload: True

Bug impact

SDG is not working

Known workaround

N/A

Additional context

ilab chat and serve works just fine

First issue:

(VllmWorkerProcess pid=487) Message: 'Cannot use FlashAttention-2 backend for head size %d.'
(VllmWorkerProcess pid=487) Arguments: (None,)
(VllmWorkerProcess pid=487) INFO 08-01 10:42:42 [cuda.py:289] Using XFormers backend.
(VllmWorkerProcess pid=488) --- Logging error ---
(VllmWorkerProcess pid=488) Traceback (most recent call last):
(VllmWorkerProcess pid=488)   File "/usr/lib64/python3.11/logging/__init__.py", line 1110, in emit
(VllmWorkerProcess pid=488)     msg = self.format(record)
(VllmWorkerProcess pid=488)           ^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=488)   File "/usr/lib64/python3.11/logging/__init__.py", line 953, in format
(VllmWorkerProcess pid=488)     return fmt.format(record)
(VllmWorkerProcess pid=488)            ^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=488)   File "/opt/app-root/lib64/python3.11/site-packages/vllm/logging_utils/formatter.py", line 13, in format
(VllmWorkerProcess pid=488)     msg = logging.Formatter.format(self, record)
(VllmWorkerProcess pid=488)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=488)   File "/usr/lib64/python3.11/logging/__init__.py", line 687, in format
(VllmWorkerProcess pid=488)     record.message = record.getMessage()
(VllmWorkerProcess pid=488)                      ^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=488)   File "/usr/lib64/python3.11/logging/__init__.py", line 377, in getMessage
(VllmWorkerProcess pid=488)     msg = msg % self.args
(VllmWorkerProcess pid=488)           ~~~~^~~~~~~~~~~
(VllmWorkerProcess pid=488) TypeError: %d format: a real number is required, not NoneType

causes: INFO 08-01 10:42:42 [cuda.py:289] Using XFormers backend.

and subsequently, XFormers fails too:

INFO 2025-08-01 10:42:47,549 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:60237/v1, this might take a moment... Attempt: 15/1200
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     output = run_method(worker, method, args, kwargs)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/utils.py", line 2378, in run_method
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     return func(*args, **kwargs)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]            ^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 183, in load_model
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.model_runner.load_model()
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 1113, in load_model
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.model = get_model(vllm_config=self.vllm_config)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     return loader.load_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 452, in load_model
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     model = _initialize_model(vllm_config=vllm_config)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/model_loader/loader.py", line 133, in _initialize_model
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     return model_class(vllm_config=vllm_config, prefix=prefix)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 438, in __init__
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.model = MixtralModel(vllm_config=vllm_config,
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 151, in __init__
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     old_init(self, vllm_config=vllm_config, prefix=prefix, **kwargs)
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 276, in __init__
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.start_layer, self.end_layer, self.layers = make_layers(
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                                                     ^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/utils.py", line 609, in make_layers
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     [PPMissingLayer() for _ in range(start_layer)] + [
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                                                      ^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/utils.py", line 610, in <listcomp>
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 278, in <lambda>
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     lambda prefix: MixtralDecoderLayer(
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                    ^^^^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 205, in __init__
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.self_attn = MixtralAttention(
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                      ^^^^^^^^^^^^^^^^^
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/mixtral.py", line 143, in __init__
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]     self.q_size = self.num_heads * self.head_dim
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]                   ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~
(VllmWorkerProcess pid=486) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'
(VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Exception in worker VllmWorkerProcess while processing method load_model.
(VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238] Traceback (most recent call last):
(VllmWorkerProcess pid=489) ERROR 08-01 10:42:47 [multiproc_worker_utils.py:238]   File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 232, in _run_worker_process