Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: RHAIIS-3.1
Affects Version/s: RHAIIS-3.1
Component/s: Wheel Package Index
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
AP Sprint 9

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

While testing the ibm-granite/granite-vision-3.1-2b-preview model on the GH200 machine, we observed that the model deployment is failing due to a missing package.

Notes:

The model is deploying successfully on the NVIDIA A100.( I have attched the logs)
The model is also deploying successfully on the AMD MI300x.
We verified that the package xformers (v0.1.19) is already present in the container image.

GH200 logs:

podman run -ti --rm --pull=newer     --userns=keep-id:uid=1001     --shm-size=4g     -p 8000:8000  --env "HF_HUB_OFFLINE=0"     -v ./home/rhaiis-cache:/opt/app-root/src/.cache:Z    --name=rhaiis     --device=nvidia.com/gpu=all     quay.io/aipcc/rhaiis/cuda-ubi9:3.1-0-1750864644     --model ibm-granite/granite-vision-3.1-2b-preview --max-model-len 10000 --enable-chunked-prefill is-cache:/opt/app-root/src/.cache:Z    --name=rhaiis     --device=nvidia.coINFO 06-26 11:45:34 [__init__.py:243] Automatically detected platform cuda.
INFO 06-26 11:45:35 [__init__.py:31] Available plugins for group vllm.general_plugins:
INFO 06-26 11:45:35 [__init__.py:33] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 06-26 11:45:35 [__init__.py:36] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 06-26 11:45:35 [api_server.py:1289] vLLM API server version 0.9.0.1
INFO 06-26 11:45:36 [cli_args.py:300] non-default args: {'model': 'ibm-granite/granite-vision-3.1-2b-preview', 'max_model_len': 10000, 'enable_chunked_prefill': True}
INFO 06-26 11:45:42 [config.py:793] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'.
INFO 06-26 11:45:42 [config.py:2118] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 06-26 11:45:43 [core.py:438] Waiting for init message from front-end.
INFO 06-26 11:45:43 [core.py:65] Initializing a V1 LLM engine (v0.9.0.1) with config: model='ibm-granite/granite-vision-3.1-2b-preview', speculative_config=None, tokenizer='ibm-granite/granite-vision-3.1-2b-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=ibm-granite/granite-vision-3.1-2b-preview, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level": 3, "custom_ops": ["none"], "splitting_ops": ["vllm.unified_attention", "vllm.unified_attention_with_output"], "compile_sizes": [], "inductor_compile_config": {"enable_auto_functionalized_v2": false}, "use_cudagraph": true, "cudagraph_num_of_warmups": 1, "cudagraph_capture_sizes": [512, 504, 496, 488, 480, 472, 464, 456, 448, 440, 432, 424, 416, 408, 400, 392, 384, 376, 368, 360, 352, 344, 336, 328, 320, 312, 304, 296, 288, 280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], "max_capture_size": 512}
WARNING 06-26 11:45:43 [utils.py:2671] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xfffd04673c80>
INFO 06-26 11:45:44 [parallel_state.py:1064] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
INFO 06-26 11:45:44 [topk_topp_sampler.py:48] Using FlashInfer for top-p & top-k sampling.
INFO 06-26 11:45:45 [gpu_model_runner.py:1531] Starting to load model ibm-granite/granite-vision-3.1-2b-preview...
INFO 06-26 11:45:45 [cuda.py:217] Using Flash Attention backend on V1 engine.
INFO 06-26 11:45:45 [backends.py:35] Using InductorAdaptor
INFO 06-26 11:45:45 [weight_utils.py:291] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00<00:00,  2.34it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  4.16it/s]
Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  3.72it/s]

INFO 06-26 11:45:46 [default_loader.py:280] Loading weights took 0.56 seconds
INFO 06-26 11:45:46 [gpu_model_runner.py:1549] Model loading took 5.5552 GiB and 1.013771 seconds
INFO 06-26 11:45:46 [gpu_model_runner.py:1863] Encoder cache will be initialized with a budget of 8289 tokens, and profiled with 1 image items of the maximum feature size.
ERROR 06-26 11:45:46 [core.py:500] EngineCore failed to start.
ERROR 06-26 11:45:46 [core.py:500] Traceback (most recent call last):
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core
ERROR 06-26 11:45:46 [core.py:500]     engine_core = EngineCoreProc(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__
ERROR 06-26 11:45:46 [core.py:500]     super().__init__(vllm_config, executor_class, log_stats,
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__
ERROR 06-26 11:45:46 [core.py:500]     self._initialize_kv_caches(vllm_config)
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches
ERROR 06-26 11:45:46 [core.py:500]     available_gpu_memory = self.model_executor.determine_available_memory()
ERROR 06-26 11:45:46 [core.py:500]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
ERROR 06-26 11:45:46 [core.py:500]     output = self.collective_rpc("determine_available_memory")
ERROR 06-26 11:45:46 [core.py:500]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
ERROR 06-26 11:45:46 [core.py:500]     answer = run_method(self.driver_worker, method, args, kwargs)
ERROR 06-26 11:45:46 [core.py:500]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method
ERROR 06-26 11:45:46 [core.py:500]     return func(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 06-26 11:45:46 [core.py:500]     return func(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
ERROR 06-26 11:45:46 [core.py:500]     self.model_runner.profile_run()
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run
ERROR 06-26 11:45:46 [core.py:500]     dummy_encoder_outputs = self.model.get_multimodal_embeddings(
ERROR 06-26 11:45:46 [core.py:500]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings
ERROR 06-26 11:45:46 [core.py:500]     vision_embeddings = self._process_image_input(image_input)
ERROR 06-26 11:45:46 [core.py:500]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input
ERROR 06-26 11:45:46 [core.py:500]     patch_embeddings = self._process_image_pixels(image_input)
ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels
ERROR 06-26 11:45:46 [core.py:500]     stacked_image_features = self._image_pixels_to_features(
ERROR 06-26 11:45:46 [core.py:500]                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features
ERROR 06-26 11:45:46 [core.py:500]     image_features = vision_tower(
ERROR 06-26 11:45:46 [core.py:500]                      ^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
ERROR 06-26 11:45:46 [core.py:500]     return self.vision_model(
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
ERROR 06-26 11:45:46 [core.py:500]     encoder_outputs = self.encoder(
ERROR 06-26 11:45:46 [core.py:500]                       ^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
ERROR 06-26 11:45:46 [core.py:500]     hidden_states, _ = encoder_layer(hidden_states)
ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
ERROR 06-26 11:45:46 [core.py:500]     hidden_states, _ = self.self_attn(hidden_states=hidden_states)
ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward
ERROR 06-26 11:45:46 [core.py:500]     out = self.attn(query_states, key_states, value_states)
ERROR 06-26 11:45:46 [core.py:500]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward
ERROR 06-26 11:45:46 [core.py:500]     from xformers import ops as xops
ERROR 06-26 11:45:46 [core.py:500] ModuleNotFoundError: No module named 'xformers'
Process EngineCore_0:
Traceback (most recent call last):
  File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 504, in run_engine_core
    raise e
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__
    self._initialize_kv_caches(vllm_config)
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches
    available_gpu_memory = self.model_executor.determine_available_memory()
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
    output = self.collective_rpc("determine_available_memory")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
    answer = run_method(self.driver_worker, method, args, kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
    self.model_runner.profile_run()
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run
    dummy_encoder_outputs = self.model.get_multimodal_embeddings(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings
    vision_embeddings = self._process_image_input(image_input)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input
    patch_embeddings = self._process_image_pixels(image_input)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels
    stacked_image_features = self._image_pixels_to_features(
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features
    image_features = vision_tower(
                     ^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
    return self.vision_model(
           ^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
    encoder_outputs = self.encoder(
                      ^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
    hidden_states, _ = encoder_layer(hidden_states)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
    hidden_states, _ = self.self_attn(hidden_states=hidden_states)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward
    out = self.attn(query_states, key_states, value_states)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward
    from xformers import ops as xops
ModuleNotFoundError: No module named 'xformers'
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1376, in <module>
    uvloop.run(run_server(args))
  File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
    async with build_async_engine_client(args) as engine_client:
  File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 153, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
  File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 185, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 157, in from_vllm_config
    return cls(
           ^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 123, in __init__
    self.engine_core = core_client_class(
                       ^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 734, in __init__
    super().__init__(
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 418, in __init__
    self._wait_for_engine_startup(output_address, parallel_config)
  File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 484, in _wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}