Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-1434

registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8:1.5 fails with model list and serve

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhelai-1.5
    • Model Validation
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      To Reproduce Steps to reproduce the behavior:

      1. ilab model download docker://registry.redhat.io/rhelai1/llama-4-maverick-17b-128e-instruct-fp8 --release 1.5
      2. ilab model serve --model-path ~/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8

      Using these vllm_args:

      • --tensor-parallel-size
      • '8'
      • --max-model-len
      • '16384'
      • --uvicorn-log-level
      • debug
      • --trust-remote-code

      Fails with:

      [cloud-user@ip-172-31-36-40 ~]$ ilab model serve --model-path ~/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8/
      Parameters:
      model_path: PosixPath('/var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8') [type: pathlib.PosixPath, src: commandline]
      model_id: None [type: None, src: default]
      gpu_layers: -1 [type: int, src: default_map]
      num_threads: None [type: None, src: default]
      max_ctx_size: 4096 [type: int, src: default_map]
      model_family: None [type: None, src: default]
      log_file: None [type: None, src: default]
      chat_template: 'auto' [type: str, src: default_map]
      backend: 'vllm' [type: str, src: default_map]
      gpus: 8 [type: int, src: default_map]
      host: '127.0.0.1' [type: str, src: default_map]
      port: 8000 [type: int, src: default_map]
      DEBUG 2025-05-09 21:57:13,187 instructlab.model.backends.backends:74: Auto-detecting backend for model /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8
      — Logging error —
      Traceback (most recent call last):
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/utils.py", line 786, in is_model_safetensors
      json.load(f)
      File "/usr/lib64/python3.11/json/_init_.py", line 293, in load
      return loads(fp.read(),
      ^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/json/_init_.py", line 346, in loads
      return _default_decoder.decode(s)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/json/decoder.py", line 337, in decode
      obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/json/decoder.py", line 353, in raw_decode
      obj, end = self.scan_once(s, idx)
      ^^^^^^^^^^^^^^^^^^^^^^
      json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 23 column 1 (char 447)

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
      File "/usr/lib64/python3.11/logging/_init_.py", line 1110, in emit
      msg = self.format(record)
      ^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/logging/_init_.py", line 953, in format
      return fmt.format(record)
      ^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/log.py", line 19, in format
      return super().format(record)
      ^^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/logging/_init_.py", line 687, in format
      record.message = record.getMessage()
      ^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/logging/_init_.py", line 377, in getMessage
      msg = msg % self.args
      ~~~^~~~~~~~~~
      TypeError: not all arguments converted during string formatting
      Call stack:
      File "/opt/app-root/bin/ilab", line 8, in <module>
      sys.exit(ilab())
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in _call_
      return self.main(*args, **kwargs)
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
      rv = self.invoke(ctx)
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
      return ctx.invoke(self.callback, **ctx.params)
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
      return __callback(*args, **kwargs)
      File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper
      return f(*args, **kwargs)
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/serve.py", line 149, in serve
      serve_backend(
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 76, in serve_backend
      backend = backends.get(model_path, backend)
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 76, in get
      auto_detected_backend, auto_detected_backend_reason = determine_backend(
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 30, in determine_backend
      if model_path.is_dir() and is_model_safetensors(model_path):
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/utils.py", line 788, in is_model_safetensors
      logger.debug("'%s' is not a valid JSON file: e", file, e)
      Message: "'%s' is not a valid JSON file: e"
      Arguments: (PosixPath('/var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8/special_tokens_map.json'), JSONDecodeError('Expecting property name enclosed in double quotes: line 23 column 1 (char 447)'))
      DEBUG 2025-05-09 21:57:16,151 instructlab.utils:804: GGUF Path /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is a directory
      Traceback (most recent call last):
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 76, in get
      auto_detected_backend, auto_detected_backend_reason = determine_backend(
      ^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 57, in determine_backend
      raise ValueError(
      ValueError: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
      Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
      Note that vLLM is only supported on Linux.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 76, in serve_backend
      backend = backends.get(model_path, backend)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/backends.py", line 80, in get
      raise ValueError(f"Cannot determine which backend to use:

      {e}") from e
      ValueError: Cannot determine which backend to use: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
      Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
      Note that vLLM is only supported on Linux.

      The above exception was the direct cause of the following exception:

      Traceback (most recent call last):
      File "/opt/app-root/bin/ilab", line 8, in <module>
      sys.exit(ilab())
      ^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in _call_
      return self.main(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main
      rv = self.invoke(ctx)
      ^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke
      return _process_result(sub_ctx.command.invoke(sub_ctx))
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke
      return ctx.invoke(self.callback, **ctx.params)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke
      return __callback(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func
      return f(get_current_context(), *args, **kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper
      return f(*args, **kwargs)
      ^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/serve.py", line 149, in serve
      serve_backend(
      File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 78, in serve_backend
      raise ValueError(f"Failed to determine backend: {e}

      ") from e
      ValueError: Failed to determine backend: Cannot determine which backend to use: The model file /var/home/cloud-user/.cache/instructlab/models/llama-4-maverick-17b-128e-instruct-fp8 is not a GGUF format nor a directory containing huggingface safetensors files. Cannot determine which backend to use.
      Please use a GGUF file for llama-cpp or a directory containing huggingface safetensors files for vllm.
      Note that vLLM is only supported on Linux.

      Expected behavior

      Should succeed

      Screenshots

      • Attached Image

      Device Info (please complete the following information):

      • Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
      • OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
      • InstructLab Version: [output of \\\{{{}ilab --version{}}}]
      • Provide the output of these two commands:
        • sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187
        • ilab system info to print detailed information about InstructLab version, OS, and hardware – including GPU / AI accelerator hardware

      [cloud-user@ip-172-31-36-40 ~]$ ilab --version
      ilab, version 0.26.1
      [cloud-user@ip-172-31-36-40 ~]$ ilab system info
      ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
      ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
      ggml_cuda_init: found 8 CUDA devices:
      Device 0: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 1: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 2: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 3: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 4: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 5: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 6: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 7: NVIDIA L40S, compute capability 8.9, VMM: yes
      Platform:
      sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
      sys.platform: linux
      os.name: posix
      platform.release: 5.14.0-427.65.1.el9_4.x86_64
      platform.machine: x86_64
      platform.node: ip-172-31-36-40.us-east-2.compute.internal
      platform.python_version: 3.11.7
      os-release.ID: rhel
      os-release.VERSION_ID: 9.4
      os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
      memory.total: 1492.02 GB
      memory.available: 1481.33 GB
      memory.used: 2.87 GB

      InstructLab:
      instructlab.version: 0.26.1
      instructlab-dolomite.version: 0.2.0
      instructlab-eval.version: 0.5.1
      instructlab-quantize.version: 0.1.0
      instructlab-schema.version: 0.4.2
      instructlab-sdg.version: 0.8.2
      instructlab-training.version: 0.10.2

      Torch:
      torch.version: 2.6.0
      torch.backends.cpu.capability: AVX2
      torch.version.cuda: 12.4
      torch.version.hip: None
      torch.cuda.available: True
      torch.backends.cuda.is_built: True
      torch.backends.mps.is_built: False
      torch.backends.mps.is_available: False
      torch.cuda.bf16: True
      torch.cuda.current.device: 0
      torch.cuda.0.name: NVIDIA L40S
      torch.cuda.0.free: 43.9 GB
      torch.cuda.0.total: 44.3 GB
      torch.cuda.0.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.1.name: NVIDIA L40S
      torch.cuda.1.free: 43.9 GB
      torch.cuda.1.total: 44.3 GB
      torch.cuda.1.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.2.name: NVIDIA L40S
      torch.cuda.2.free: 43.9 GB
      torch.cuda.2.total: 44.3 GB
      torch.cuda.2.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.3.name: NVIDIA L40S
      torch.cuda.3.free: 43.9 GB
      torch.cuda.3.total: 44.3 GB
      torch.cuda.3.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.4.name: NVIDIA L40S
      torch.cuda.4.free: 43.9 GB
      torch.cuda.4.total: 44.3 GB
      torch.cuda.4.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.5.name: NVIDIA L40S
      torch.cuda.5.free: 43.9 GB
      torch.cuda.5.total: 44.3 GB
      torch.cuda.5.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.6.name: NVIDIA L40S
      torch.cuda.6.free: 43.9 GB
      torch.cuda.6.total: 44.3 GB
      torch.cuda.6.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.7.name: NVIDIA L40S
      torch.cuda.7.free: 43.9 GB
      torch.cuda.7.total: 44.3 GB
      torch.cuda.7.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)

      llama_cpp_python:
      llama_cpp_python.version: 0.3.6
      llama_cpp_python.supports_gpu_offload: True

      Bug impact

      • Please provide information on the impact of this bug to the end user.

      Known workaround

      • Please add any known workarounds.

      Additional context

      • <your text here>

              Unassigned Unassigned
              dmcphers@redhat.com Dan McPherson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: