-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhelai-1.5
-
None
-
False
-
-
False
-
-
To Reproduce Steps to reproduce the behavior:
- ilab model download docker://registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8 --release 1.5
- ilab model serve --model-path ~/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8
Using these vllm_args:
- --tensor-parallel-size
- '8'
- --max-model-len
- '16384'
- --uvicorn-log-level
- debug
- --trust-remote-code
Fails with:
[cloud-user@ip-172-31-36-40 ~]$ ilab model serve --model-path ~/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8/
Parameters:
model_path: PosixPath('/var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8') [type: pathlib.PosixPath, src: commandline]
model_id: None [type: None, src: default]
gpu_layers: -1 [type: int, src: default_map]
num_threads: None [type: None, src: default]
max_ctx_size: 4096 [type: int, src: default_map]
model_family: None [type: None, src: default]
log_file: None [type: None, src: default]
chat_template: 'auto' [type: str, src: default_map]
backend: 'vllm' [type: str, src: default_map]
gpus: 8 [type: int, src: default_map]
host: '127.0.0.1' [type: str, src: default_map]
port: 8000 [type: int, src: default_map]
DEBUG 2025-05-09 21:57:13,187 instructlab.model.backends.backends:74: Auto-detecting backend for model /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8
— Logging error —
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/utils.py", line 786, in is_model_safetensors
json.load(f)
File "/usr/lib64/python3.11/json/_init_.py", line 293, in load
return loads(fp.read(),
^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/json/_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 23 column 1 (char 447)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib64/python3.11/logging/_init_.py", line 1110, in emit
msg = self.format(record)
^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/logging/_init_.py", line 953, in format
return fmt.format(record)
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/log.py", line 19, in format
return super().format(record)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/logging/_init_.py", line 687, in format
record.message = record.getMessage()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib64/python3.11/logging/_init_.py", line 377, in getMessage
msg = msg % self.args
~~~^~~~~~~~~~
TypeError: not all arguments converted during string formatting
Call stack:
File "/opt/app-root/bin/ilab", line 8, in <module>
sys.exit(ilab())
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in _call_
return self.main(*args, **kwargs)
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper
return f(*args, **kwargs)
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/serve.py", line 149, in serve
serve_backend(
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 76, in serve_backend
backend = backends.get(model_path, backend)
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 76, in get
auto_detected_backend, auto_detected_backend_reason = determine_backend(
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 30, in determine_backend
if model_path.is_dir() and is_model_safetensors(model_path):
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/utils.py", line 788, in is_model_safetensors
logger.debug("'%s' is not a valid JSON file: e", file, e)
Message: "'%s' is not a valid JSON file: e"
Arguments: (PosixPath('/var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8/special_tokens_map.json'), JSONDecodeError('Expecting property name enclosed in double quotes: line 23 column 1 (char 447)'))
DEBUG 2025-05-09 21:57:16,151 instructlab.utils:804: GGUF Path /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is a directory
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 76, in get
auto_detected_backend, auto_detected_backend_reason = determine_backend(
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 57, in determine_backend
raise ValueError(
ValueError: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
Note that vLLM is only supported on Linux.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 76, in serve_backend
backend = backends.get(model_path, backend)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 80, in get
raise ValueError(f"Cannot determine which backend to use:
ValueError: Cannot determine which backend to use: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
Note that vLLM is only supported on Linux.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/app-root/bin/ilab", line 8, in <module>
sys.exit(ilab())
^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in _call_
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
return f(get_current_context(), *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/serve.py", line 149, in serve
serve_backend(
File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 78, in serve_backend
raise ValueError(f"Failed to determine backend: {e}
") from e
ValueError: Failed to determine backend: Cannot determine which backend to use: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
Note that vLLM is only supported on Linux.
Expected behavior
Should succeed
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
- OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
- InstructLab Version: [output of \\\{{{}ilab --version{}}}]
- Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187
- ilab system info to print detailed information about InstructLab version, OS, and hardware – including GPU / AI accelerator hardware
[cloud-user@ip-172-31-36-40 ~]$ ilab --version
ilab, version 0.26.1
[cloud-user@ip-172-31-36-40 ~]$ ilab system info
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 CUDA devices:
Device 0: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 1: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 2: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 3: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 4: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 5: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 6: NVIDIA L40S, compute capability 8.9, VMM: yes
Device 7: NVIDIA L40S, compute capability 8.9, VMM: yes
Platform:
sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
sys.platform: linux
os.name: posix
platform.release: 5.14.0-427.65.1.el9_4.x86_64
platform.machine: x86_64
platform.node: ip-172-31-36-40.us-east-2.compute.internal
platform.python_version: 3.11.7
os-release.ID: rhel
os-release.VERSION_ID: 9.4
os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
memory.total: 1492.02 GB
memory.available: 1481.33 GB
memory.used: 2.87 GB
InstructLab:
instructlab.version: 0.26.1
instructlab-dolomite.version: 0.2.0
instructlab-eval.version: 0.5.1
instructlab-quantize.version: 0.1.0
instructlab-schema.version: 0.4.2
instructlab-sdg.version: 0.8.2
instructlab-training.version: 0.10.2
Torch:
torch.version: 2.6.0
torch.backends.cpu.capability: AVX2
torch.version.cuda: 12.4
torch.version.hip: None
torch.cuda.available: True
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
torch.cuda.bf16: True
torch.cuda.current.device: 0
torch.cuda.0.name: NVIDIA L40S
torch.cuda.0.free: 43.9 GB
torch.cuda.0.total: 44.3 GB
torch.cuda.0.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.1.name: NVIDIA L40S
torch.cuda.1.free: 43.9 GB
torch.cuda.1.total: 44.3 GB
torch.cuda.1.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.2.name: NVIDIA L40S
torch.cuda.2.free: 43.9 GB
torch.cuda.2.total: 44.3 GB
torch.cuda.2.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.3.name: NVIDIA L40S
torch.cuda.3.free: 43.9 GB
torch.cuda.3.total: 44.3 GB
torch.cuda.3.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.4.name: NVIDIA L40S
torch.cuda.4.free: 43.9 GB
torch.cuda.4.total: 44.3 GB
torch.cuda.4.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.5.name: NVIDIA L40S
torch.cuda.5.free: 43.9 GB
torch.cuda.5.total: 44.3 GB
torch.cuda.5.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.6.name: NVIDIA L40S
torch.cuda.6.free: 43.9 GB
torch.cuda.6.total: 44.3 GB
torch.cuda.6.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.7.name: NVIDIA L40S
torch.cuda.7.free: 43.9 GB
torch.cuda.7.total: 44.3 GB
torch.cuda.7.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
llama_cpp_python:
llama_cpp_python.version: 0.3.6
llama_cpp_python.supports_gpu_offload: True
Bug impact
- Please provide information on the impact of this bug to the end user.
Known workaround
- Please add any known workarounds.
Additional context
- <your text here>
- …