-
Bug
-
Resolution: Done
-
Major
-
RHAIIS-3.1
-
None
-
False
-
-
False
-
-
-
AP Sprint 9
While testing the ibm-granite/granite-vision-3.1-2b-preview model on the GH200 machine, we observed that the model deployment is failing due to a missing package.
Notes:
- The model is deploying successfully on the NVIDIA A100.( I have attched the logs)
- The model is also deploying successfully on the AMD MI300x.
- We verified that the package xformers (v0.1.19) is already present in the container image.
GH200 logs:
podman run -ti --rm --pull=newer --userns=keep-id:uid=1001 --shm-size=4g -p 8000:8000 --env "HF_HUB_OFFLINE=0" -v ./home/rhaiis-cache:/opt/app-root/src/.cache:Z --name=rhaiis --device=nvidia.com/gpu=all quay.io/aipcc/rhaiis/cuda-ubi9:3.1-0-1750864644 --model ibm-granite/granite-vision-3.1-2b-preview --max-model-len 10000 --enable-chunked-prefill is-cache:/opt/app-root/src/.cache:Z --name=rhaiis --device=nvidia.coINFO 06-26 11:45:34 [__init__.py:243] Automatically detected platform cuda. INFO 06-26 11:45:35 [__init__.py:31] Available plugins for group vllm.general_plugins: INFO 06-26 11:45:35 [__init__.py:33] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver INFO 06-26 11:45:35 [__init__.py:36] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 06-26 11:45:35 [api_server.py:1289] vLLM API server version 0.9.0.1 INFO 06-26 11:45:36 [cli_args.py:300] non-default args: {'model': 'ibm-granite/granite-vision-3.1-2b-preview', 'max_model_len': 10000, 'enable_chunked_prefill': True} INFO 06-26 11:45:42 [config.py:793] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'. INFO 06-26 11:45:42 [config.py:2118] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 06-26 11:45:43 [core.py:438] Waiting for init message from front-end. INFO 06-26 11:45:43 [core.py:65] Initializing a V1 LLM engine (v0.9.0.1) with config: model='ibm-granite/granite-vision-3.1-2b-preview', speculative_config=None, tokenizer='ibm-granite/granite-vision-3.1-2b-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=ibm-granite/granite-vision-3.1-2b-preview, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level": 3, "custom_ops": ["none"], "splitting_ops": ["vllm.unified_attention", "vllm.unified_attention_with_output"], "compile_sizes": [], "inductor_compile_config": {"enable_auto_functionalized_v2": false}, "use_cudagraph": true, "cudagraph_num_of_warmups": 1, "cudagraph_capture_sizes": [512, 504, 496, 488, 480, 472, 464, 456, 448, 440, 432, 424, 416, 408, 400, 392, 384, 376, 368, 360, 352, 344, 336, 328, 320, 312, 304, 296, 288, 280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], "max_capture_size": 512} WARNING 06-26 11:45:43 [utils.py:2671] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xfffd04673c80> INFO 06-26 11:45:44 [parallel_state.py:1064] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0 Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`. INFO 06-26 11:45:44 [topk_topp_sampler.py:48] Using FlashInfer for top-p & top-k sampling. INFO 06-26 11:45:45 [gpu_model_runner.py:1531] Starting to load model ibm-granite/granite-vision-3.1-2b-preview... INFO 06-26 11:45:45 [cuda.py:217] Using Flash Attention backend on V1 engine. INFO 06-26 11:45:45 [backends.py:35] Using InductorAdaptor INFO 06-26 11:45:45 [weight_utils.py:291] Using model weights format ['*.safetensors'] Loading safetensors checkpoint shards: 0% Completed | 0/2 [00:00<?, ?it/s] Loading safetensors checkpoint shards: 50% Completed | 1/2 [00:00<00:00, 2.34it/s] Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 4.16it/s] Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00, 3.72it/s] INFO 06-26 11:45:46 [default_loader.py:280] Loading weights took 0.56 seconds INFO 06-26 11:45:46 [gpu_model_runner.py:1549] Model loading took 5.5552 GiB and 1.013771 seconds INFO 06-26 11:45:46 [gpu_model_runner.py:1863] Encoder cache will be initialized with a budget of 8289 tokens, and profiled with 1 image items of the maximum feature size. ERROR 06-26 11:45:46 [core.py:500] EngineCore failed to start. ERROR 06-26 11:45:46 [core.py:500] Traceback (most recent call last): ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core ERROR 06-26 11:45:46 [core.py:500] engine_core = EngineCoreProc(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__ ERROR 06-26 11:45:46 [core.py:500] super().__init__(vllm_config, executor_class, log_stats, ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__ ERROR 06-26 11:45:46 [core.py:500] self._initialize_kv_caches(vllm_config) ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches ERROR 06-26 11:45:46 [core.py:500] available_gpu_memory = self.model_executor.determine_available_memory() ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory ERROR 06-26 11:45:46 [core.py:500] output = self.collective_rpc("determine_available_memory") ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc ERROR 06-26 11:45:46 [core.py:500] answer = run_method(self.driver_worker, method, args, kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method ERROR 06-26 11:45:46 [core.py:500] return func(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context ERROR 06-26 11:45:46 [core.py:500] return func(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory ERROR 06-26 11:45:46 [core.py:500] self.model_runner.profile_run() ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run ERROR 06-26 11:45:46 [core.py:500] dummy_encoder_outputs = self.model.get_multimodal_embeddings( ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings ERROR 06-26 11:45:46 [core.py:500] vision_embeddings = self._process_image_input(image_input) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input ERROR 06-26 11:45:46 [core.py:500] patch_embeddings = self._process_image_pixels(image_input) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels ERROR 06-26 11:45:46 [core.py:500] stacked_image_features = self._image_pixels_to_features( ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features ERROR 06-26 11:45:46 [core.py:500] image_features = vision_tower( ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward ERROR 06-26 11:45:46 [core.py:500] return self.vision_model( ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward ERROR 06-26 11:45:46 [core.py:500] encoder_outputs = self.encoder( ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward ERROR 06-26 11:45:46 [core.py:500] hidden_states, _ = encoder_layer(hidden_states) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward ERROR 06-26 11:45:46 [core.py:500] hidden_states, _ = self.self_attn(hidden_states=hidden_states) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward ERROR 06-26 11:45:46 [core.py:500] out = self.attn(query_states, key_states, value_states) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl ERROR 06-26 11:45:46 [core.py:500] return self._call_impl(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl ERROR 06-26 11:45:46 [core.py:500] return forward_call(*args, **kwargs) ERROR 06-26 11:45:46 [core.py:500] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 06-26 11:45:46 [core.py:500] File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward ERROR 06-26 11:45:46 [core.py:500] from xformers import ops as xops ERROR 06-26 11:45:46 [core.py:500] ModuleNotFoundError: No module named 'xformers' Process EngineCore_0: Traceback (most recent call last): File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 504, in run_engine_core raise e File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core engine_core = EngineCoreProc(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__ super().__init__(vllm_config, executor_class, log_stats, File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__ self._initialize_kv_caches(vllm_config) File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches available_gpu_memory = self.model_executor.determine_available_memory() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory output = self.collective_rpc("determine_available_memory") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory self.model_runner.profile_run() File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run dummy_encoder_outputs = self.model.get_multimodal_embeddings( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings vision_embeddings = self._process_image_input(image_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input patch_embeddings = self._process_image_pixels(image_input) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels stacked_image_features = self._image_pixels_to_features( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features image_features = vision_tower( ^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward return self.vision_model( ^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward encoder_outputs = self.encoder( ^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward hidden_states, _ = encoder_layer(hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward hidden_states, _ = self.self_attn(hidden_states=hidden_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward out = self.attn(query_states, key_states, value_states) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward from xformers import ops as xops ModuleNotFoundError: No module named 'xformers' Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1376, in <module> uvloop.run(run_server(args)) File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 153, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 185, in build_async_engine_client_from_engine_args async_llm = AsyncLLM.from_vllm_config( ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 157, in from_vllm_config return cls( ^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 123, in __init__ self.engine_core = core_client_class( ^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 734, in __init__ super().__init__( File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 418, in __init__ self._wait_for_engine_startup(output_address, parallel_config) File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 484, in _wait_for_engine_startup raise RuntimeError("Engine core initialization failed. " RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}