Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-1435

registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503:1.5 fails to serve

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhelai-1.5
    • Model Validation
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      To Reproduce Steps to reproduce the behavior:

      1. ilab model download docker://registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503 --release 1.5
      2. ilab model serve --model-path ~/.cache/instructlab/models/mistral-small-3-1-24b-instruct-2503/

      Using these vllm_args:

      • --tensor-parallel-size
      • '1'
      • --max-model-len
      • '16384'
      • --uvicorn-log-level
      • debug
      • --trust-remote-code
      • --tokenizer-mode
      • mistral
      • --config-format
      • mistral
      • --load-format
      • mistral
      • --tool-call-parser
      • mistral
      • --enable-auto-tool-choice
      • --limit-mm-per-prompt
      • image=10

      Specified from: https://docs.google.com/spreadsheets/d/1NGPhJV0pk7jYuAFOHk7aWPomX7Svb_-Xa-OVUVtpNbM/edit?gid=1505755754#gid=1505755754

      Fails with:

      Traceback (most recent call last):
      File "<frozen runpy>", line 198, in _run_module_as_main
      File "<frozen runpy>", line 88, in _run_code
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1121, in <module>
      uvloop.run(run_server(args))
      File "/opt/app-root/lib64/python3.11/site-packages/uvloop/_init_.py", line 105, in run
      return runner.run(wrapper())
      ^^^^^^^^^^^^^^^^^^^^^
      File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run
      return self._loop.run_until_complete(task)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
      File "/opt/app-root/lib64/python3.11/site-packages/uvloop/_init_.py", line 61, in wrapper
      return await main
      ^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server
      async with build_async_engine_client(args) as engine_client:
      File "/usr/lib64/python3.11/contextlib.py", line 210, in _aenter_
      return await anext(self.gen)
      ^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
      async with build_async_engine_client_from_engine_args(
      File "/usr/lib64/python3.11/contextlib.py", line 210, in _aenter_
      return await anext(self.gen)
      ^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
      async_llm = AsyncLLM.from_vllm_config(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 136, in from_vllm_config
      return cls(
      ^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/async_llm.py", line 83, in _init_
      self.tokenizer = init_tokenizer_from_configs(
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizer_group/_init_.py", line 32, in init_tokenizer_from_configs
      return get_tokenizer_group(parallel_config.tokenizer_pool_config,
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizer_group/_init_.py", line 53, in get_tokenizer_group
      return tokenizer_cls.from_config(tokenizer_pool_config, **init_kwargs)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 33, in from_config
      return cls(**init_kwargs)
      ^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 25, in _init_
      self.tokenizer = get_tokenizer(self.tokenizer_id, **tokenizer_config)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizer.py", line 208, in get_tokenizer
      tokenizer = MistralTokenizer.from_pretrained(str(tokenizer_name),
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizers/mistral.py", line 224, in from_pretrained
      tokenizer_file_name = find_tokenizer_file(
      ^^^^^^^^^^^^^^^^^^^^
      File "/opt/app-root/lib64/python3.11/site-packages/vllm/transformers_utils/tokenizers/mistral.py", line 139, in find_tokenizer_file
      raise OSError(
      OSError: Found 0 files matching the pattern: `^tokenizer\.model\.v.$|^tekken\.json$|^tokenizer\.mm\.model\.v.$`. Make sure that a Mistral tokenizer is present in ['README.md', 'SYSTEM_PROMPT.txt', 'chat_template.json', 'config.json', 'generation_config.json', 'model-00001-of-00010.safetensors', 'model-00002-of-00010.safetensors', 'model-00003-of-00010.safetensors', 'model-00004-of-00010.safetensors', 'model-00006-of-00010.safetensors', 'model-00007-of-00010.safetensors', 'model-00008-of-00010.safetensors', 'model-00009-of-00010.safetensors', 'model-00010-of-00010.safetensors', 'params.json', 'processor_config.json', 'special_tokens_map.json', 'tokenizer.json', 'tokenizer_config.json'].

      Expected behavior

      Should succeed

      Screenshots

      • Attached Image

      Device Info (please complete the following information):

      • Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
      • OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
      • InstructLab Version: [output of \\\{{{}ilab --version{}}}]
      • Provide the output of these two commands:
        • sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187
        • ilab system info to print detailed information about InstructLab version, OS, and hardware – including GPU / AI accelerator hardware

      [cloud-user@ip-172-31-36-40 ~]$ ilab --version
      ilab, version 0.26.1
      [cloud-user@ip-172-31-36-40 ~]$ ilab system info
      ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
      ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
      ggml_cuda_init: found 8 CUDA devices:
      Device 0: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 1: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 2: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 3: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 4: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 5: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 6: NVIDIA L40S, compute capability 8.9, VMM: yes
      Device 7: NVIDIA L40S, compute capability 8.9, VMM: yes
      Platform:
      sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
      sys.platform: linux
      os.name: posix
      platform.release: 5.14.0-427.65.1.el9_4.x86_64
      platform.machine: x86_64
      platform.node: ip-172-31-36-40.us-east-2.compute.internal
      platform.python_version: 3.11.7
      os-release.ID: rhel
      os-release.VERSION_ID: 9.4
      os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
      memory.total: 1492.02 GB
      memory.available: 1481.33 GB
      memory.used: 2.87 GB

      InstructLab:
      instructlab.version: 0.26.1
      instructlab-dolomite.version: 0.2.0
      instructlab-eval.version: 0.5.1
      instructlab-quantize.version: 0.1.0
      instructlab-schema.version: 0.4.2
      instructlab-sdg.version: 0.8.2
      instructlab-training.version: 0.10.2

      Torch:
      torch.version: 2.6.0
      torch.backends.cpu.capability: AVX2
      torch.version.cuda: 12.4
      torch.version.hip: None
      torch.cuda.available: True
      torch.backends.cuda.is_built: True
      torch.backends.mps.is_built: False
      torch.backends.mps.is_available: False
      torch.cuda.bf16: True
      torch.cuda.current.device: 0
      torch.cuda.0.name: NVIDIA L40S
      torch.cuda.0.free: 43.9 GB
      torch.cuda.0.total: 44.3 GB
      torch.cuda.0.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.1.name: NVIDIA L40S
      torch.cuda.1.free: 43.9 GB
      torch.cuda.1.total: 44.3 GB
      torch.cuda.1.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.2.name: NVIDIA L40S
      torch.cuda.2.free: 43.9 GB
      torch.cuda.2.total: 44.3 GB
      torch.cuda.2.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.3.name: NVIDIA L40S
      torch.cuda.3.free: 43.9 GB
      torch.cuda.3.total: 44.3 GB
      torch.cuda.3.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.4.name: NVIDIA L40S
      torch.cuda.4.free: 43.9 GB
      torch.cuda.4.total: 44.3 GB
      torch.cuda.4.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.5.name: NVIDIA L40S
      torch.cuda.5.free: 43.9 GB
      torch.cuda.5.total: 44.3 GB
      torch.cuda.5.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.6.name: NVIDIA L40S
      torch.cuda.6.free: 43.9 GB
      torch.cuda.6.total: 44.3 GB
      torch.cuda.6.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)
      torch.cuda.7.name: NVIDIA L40S
      torch.cuda.7.free: 43.9 GB
      torch.cuda.7.total: 44.3 GB
      torch.cuda.7.capability: 8.9 (see https://developer.nvidia.com/cuda-gpus#compute)

      llama_cpp_python:
      llama_cpp_python.version: 0.3.6
      llama_cpp_python.supports_gpu_offload: True

      Bug impact

      • Please provide information on the impact of this bug to the end user.

      Known workaround

      • Please add any known workarounds.

      Additional context

      • <your text here>

              Unassigned Unassigned
              dmcphers@redhat.com Dan McPherson
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: