-
Bug
-
Resolution: Done
-
Undefined
-
RHELAI 1.3 GA
-
None
-
False
-
-
False
-
Impediment
-
-
-
Approved
To Reproduce Steps to reproduce the behavior:
- start an instructlab-amd-rhel9:1.3-1732663637 container
- init config and download appropriate model
ilab model serve
- See error:
$ ilab model serve INFO 2024-11-27 01:47:51,117 instructlab.model.serve_backend:56: Using model '/opt/app-root/src/.cache/instructlab/models/granite-8b-lab-v1' with -1 gpu-layers and 4096 max context size. INFO 2024-11-27 01:47:51,117 instructlab.model.serve_backend:88: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2024-11-27 01:47:51,118 instructlab.model.backends.vllm:313: vLLM starting up on pid 132 at http://127.0.0.1:8000/v1 INFO 11-27 01:47:53 importing.py:10] Triton not installed; certain GPU-related functions will not be available. /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION Traceback (most recent call last): File "<frozen runpy>", line 189, in _run_module_as_main File "<frozen runpy>", line 112, in _get_module_details File "/opt/app-root/lib64/python3.11/site-packages/vllm/__init__.py", line 6, in <module> from vllm.entrypoints.fast_sync_llm import FastSyncLLM File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/fast_sync_llm.py", line 9, in <module> from vllm.executor.multiproc_gpu_executor import MultiprocessingGPUExecutor File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 16, in <module> from vllm.triton_utils import maybe_set_triton_cache_manager ImportError: cannot import name 'maybe_set_triton_cache_manager' from 'vllm.triton_utils' (/opt/app-root/lib64/python3.11/site-packages/vllm/triton_utils/__init__.py)
Expected behavior
- vllm should start serving the default model
Device Info (please complete the following information):
- Hardware Specs: x86_64 with AMD GPU (reproduced with 4x MI210 and 8x MI300X)
- OS Version: instructlab-amd-rhel9:1.3-1732663637 running on RHEL9
- Python Version: Python 3.11.7
- InstructLab Version: ilab, version 0.21.0
Additional context
- This is the traceback that happens after the fix from
RHELAI-2400
- is related to
-
RHELAI-2400 RuntimeError: operator torchvision::nms does not exist
- Closed
- mentioned on
(17 mentioned on)