-
Bug
-
Resolution: Done
-
Critical
-
RHELAI 1.3 GA
-
None
To Reproduce Steps to reproduce the behavior:
- Use registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732202338
- Apply workaround from https://issues.redhat.com/browse/RHELAI-2390 for downloading models
- Run `ilab model serve`
- See error
bash-5.1# ILAB_HOME=/var/home/devcloud ilab model serve INFO 2024-11-26 13:35:12,849 instructlab.model.serve_backend:56: Using model '/var/home/devcloud/.cache/instructlab/models/granite-7b-redhat-lab' with -1 gpu-layers and 4096 max context size. INFO 2024-11-26 13:35:12,849 instructlab.model.serve_backend:88: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2024-11-26 13:35:12,851 instructlab.model.backends.vllm:313: vLLM starting up on pid 4 at http://127.0.0.1:8000/v1 /usr/lib64/python3.11/inspect.py:389: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead return isinstance(object, types.FunctionType) Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 33, in <module> from vllm.entrypoints.openai.serving_chat import OpenAIServingChat File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/serving_chat.py", line 28, in <module> from vllm.model_executor.guided_decoding import ( File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/__init__.py", line 6, in <module> from vllm.model_executor.guided_decoding.lm_format_enforcer_decoding import ( File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/lm_format_enforcer_decoding.py", line 15, in <module> from vllm.model_executor.guided_decoding.outlines_decoding import ( File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_decoding.py", line 13, in <module> from vllm.model_executor.guided_decoding.outlines_logits_processors import ( File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/guided_decoding/outlines_logits_processors.py", line 24, in <module> from outlines.caching import cache File "/opt/app-root/lib64/python3.11/site-packages/outlines/__init__.py", line 2, in <module> import outlines.generate File "/opt/app-root/lib64/python3.11/site-packages/outlines/generate/__init__.py", line 2, in <module> from .cfg import cfg File "/opt/app-root/lib64/python3.11/site-packages/outlines/generate/cfg.py", line 5, in <module> from outlines.models import OpenAI File "/opt/app-root/lib64/python3.11/site-packages/outlines/models/__init__.py", line 10, in <module> from .exllamav2 import ExLlamaV2Model, exl2 File "/opt/app-root/lib64/python3.11/site-packages/outlines/models/exllamav2.py", line 9, in <module> from .transformers import TransformerTokenizer File "/opt/app-root/lib64/python3.11/site-packages/outlines/models/transformers.py", line 3, in <module> from datasets.fingerprint import Hasher File "/opt/app-root/lib64/python3.11/site-packages/datasets/__init__.py", line 17, in <module> from .arrow_dataset import Dataset File "/opt/app-root/lib64/python3.11/site-packages/datasets/arrow_dataset.py", line 60, in <module> import pyarrow as pa File "/opt/app-root/lib64/python3.11/site-packages/pyarrow/__init__.py", line 65, in <module> import pyarrow.lib as _lib ImportError: /lib64/libjemalloc.so.2: cannot allocate memory in static TLS block
Expected behavior
- <your text here>
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: Intel Gaudi3
- OS Version: RHEL AI 1.3
- Python Version: [output of \\{{{}python --version{}}}]
- InstructLab Version: 0.21
Additional context
- Issue seen before : https://issues.redhat.com/browse/RHELAI-1703
- ...
- relates to
-
RHELAI-2417 Address jemalloc "cannot allocate memory in static TLS block" without LD_PRELOAD
- New
- mentioned on
(2 mentioned on)