Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-2152

Granite vision model fails to deploy on GH200 Machine due to missing xformer package

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • RHAIIS-3.1
    • RHAIIS-3.1
    • Wheel Package Index
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • AP Sprint 9

      While testing the ibm-granite/granite-vision-3.1-2b-preview model on the GH200 machine, we observed that the model deployment is failing due to a missing package.

      Notes:

      1. The model is deploying successfully on the NVIDIA A100.( I have attched the logs)
      2. The model is also deploying successfully on the AMD MI300x.
      3. We verified that the package xformers (v0.1.19) is already present in the container image.

      GH200 logs:

      podman run -ti --rm --pull=newer     --userns=keep-id:uid=1001     --shm-size=4g     -p 8000:8000  --env "HF_HUB_OFFLINE=0"     -v ./home/rhaiis-cache:/opt/app-root/src/.cache:Z    --name=rhaiis     --device=nvidia.com/gpu=all     quay.io/aipcc/rhaiis/cuda-ubi9:3.1-0-1750864644     --model ibm-granite/granite-vision-3.1-2b-preview --max-model-len 10000 --enable-chunked-prefill is-cache:/opt/app-root/src/.cache:Z    --name=rhaiis     --device=nvidia.coINFO 06-26 11:45:34 [__init__.py:243] Automatically detected platform cuda.
      INFO 06-26 11:45:35 [__init__.py:31] Available plugins for group vllm.general_plugins:
      INFO 06-26 11:45:35 [__init__.py:33] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
      INFO 06-26 11:45:35 [__init__.py:36] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
      INFO 06-26 11:45:35 [api_server.py:1289] vLLM API server version 0.9.0.1
      INFO 06-26 11:45:36 [cli_args.py:300] non-default args: {'model': 'ibm-granite/granite-vision-3.1-2b-preview', 'max_model_len': 10000, 'enable_chunked_prefill': True}
      INFO 06-26 11:45:42 [config.py:793] This model supports multiple tasks: {'classify', 'reward', 'generate', 'embed', 'score'}. Defaulting to 'generate'.
      INFO 06-26 11:45:42 [config.py:2118] Chunked prefill is enabled with max_num_batched_tokens=8192.
      INFO 06-26 11:45:43 [core.py:438] Waiting for init message from front-end.
      INFO 06-26 11:45:43 [core.py:65] Initializing a V1 LLM engine (v0.9.0.1) with config: model='ibm-granite/granite-vision-3.1-2b-preview', speculative_config=None, tokenizer='ibm-granite/granite-vision-3.1-2b-preview', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config={}, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10000, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_backend=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=ibm-granite/granite-vision-3.1-2b-preview, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, pooler_config=None, compilation_config={"level": 3, "custom_ops": ["none"], "splitting_ops": ["vllm.unified_attention", "vllm.unified_attention_with_output"], "compile_sizes": [], "inductor_compile_config": {"enable_auto_functionalized_v2": false}, "use_cudagraph": true, "cudagraph_num_of_warmups": 1, "cudagraph_capture_sizes": [512, 504, 496, 488, 480, 472, 464, 456, 448, 440, 432, 424, 416, 408, 400, 392, 384, 376, 368, 360, 352, 344, 336, 328, 320, 312, 304, 296, 288, 280, 272, 264, 256, 248, 240, 232, 224, 216, 208, 200, 192, 184, 176, 168, 160, 152, 144, 136, 128, 120, 112, 104, 96, 88, 80, 72, 64, 56, 48, 40, 32, 24, 16, 8, 4, 2, 1], "max_capture_size": 512}
      WARNING 06-26 11:45:43 [utils.py:2671] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in <vllm.v1.worker.gpu_worker.Worker object at 0xfffd04673c80>
      INFO 06-26 11:45:44 [parallel_state.py:1064] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
      Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
      INFO 06-26 11:45:44 [topk_topp_sampler.py:48] Using FlashInfer for top-p & top-k sampling.
      INFO 06-26 11:45:45 [gpu_model_runner.py:1531] Starting to load model ibm-granite/granite-vision-3.1-2b-preview...
      INFO 06-26 11:45:45 [cuda.py:217] Using Flash Attention backend on V1 engine.
      INFO 06-26 11:45:45 [backends.py:35] Using InductorAdaptor
      INFO 06-26 11:45:45 [weight_utils.py:291] Using model weights format ['*.safetensors']
      Loading safetensors checkpoint shards:   0% Completed | 0/2 [00:00<?, ?it/s]
      Loading safetensors checkpoint shards:  50% Completed | 1/2 [00:00<00:00,  2.34it/s]
      Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  4.16it/s]
      Loading safetensors checkpoint shards: 100% Completed | 2/2 [00:00<00:00,  3.72it/s]
      
      INFO 06-26 11:45:46 [default_loader.py:280] Loading weights took 0.56 seconds
      INFO 06-26 11:45:46 [gpu_model_runner.py:1549] Model loading took 5.5552 GiB and 1.013771 seconds
      INFO 06-26 11:45:46 [gpu_model_runner.py:1863] Encoder cache will be initialized with a budget of 8289 tokens, and profiled with 1 image items of the maximum feature size.
      ERROR 06-26 11:45:46 [core.py:500] EngineCore failed to start.
      ERROR 06-26 11:45:46 [core.py:500] Traceback (most recent call last):
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core
      ERROR 06-26 11:45:46 [core.py:500]     engine_core = EngineCoreProc(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__
      ERROR 06-26 11:45:46 [core.py:500]     super().__init__(vllm_config, executor_class, log_stats,
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__
      ERROR 06-26 11:45:46 [core.py:500]     self._initialize_kv_caches(vllm_config)
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches
      ERROR 06-26 11:45:46 [core.py:500]     available_gpu_memory = self.model_executor.determine_available_memory()
      ERROR 06-26 11:45:46 [core.py:500]                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
      ERROR 06-26 11:45:46 [core.py:500]     output = self.collective_rpc("determine_available_memory")
      ERROR 06-26 11:45:46 [core.py:500]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
      ERROR 06-26 11:45:46 [core.py:500]     answer = run_method(self.driver_worker, method, args, kwargs)
      ERROR 06-26 11:45:46 [core.py:500]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method
      ERROR 06-26 11:45:46 [core.py:500]     return func(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
      ERROR 06-26 11:45:46 [core.py:500]     return func(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
      ERROR 06-26 11:45:46 [core.py:500]     self.model_runner.profile_run()
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run
      ERROR 06-26 11:45:46 [core.py:500]     dummy_encoder_outputs = self.model.get_multimodal_embeddings(
      ERROR 06-26 11:45:46 [core.py:500]                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings
      ERROR 06-26 11:45:46 [core.py:500]     vision_embeddings = self._process_image_input(image_input)
      ERROR 06-26 11:45:46 [core.py:500]                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input
      ERROR 06-26 11:45:46 [core.py:500]     patch_embeddings = self._process_image_pixels(image_input)
      ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels
      ERROR 06-26 11:45:46 [core.py:500]     stacked_image_features = self._image_pixels_to_features(
      ERROR 06-26 11:45:46 [core.py:500]                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features
      ERROR 06-26 11:45:46 [core.py:500]     image_features = vision_tower(
      ERROR 06-26 11:45:46 [core.py:500]                      ^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
      ERROR 06-26 11:45:46 [core.py:500]     return self.vision_model(
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
      ERROR 06-26 11:45:46 [core.py:500]     encoder_outputs = self.encoder(
      ERROR 06-26 11:45:46 [core.py:500]                       ^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
      ERROR 06-26 11:45:46 [core.py:500]     hidden_states, _ = encoder_layer(hidden_states)
      ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
      ERROR 06-26 11:45:46 [core.py:500]     hidden_states, _ = self.self_attn(hidden_states=hidden_states)
      ERROR 06-26 11:45:46 [core.py:500]                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward
      ERROR 06-26 11:45:46 [core.py:500]     out = self.attn(query_states, key_states, value_states)
      ERROR 06-26 11:45:46 [core.py:500]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return self._call_impl(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
      ERROR 06-26 11:45:46 [core.py:500]     return forward_call(*args, **kwargs)
      ERROR 06-26 11:45:46 [core.py:500]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      ERROR 06-26 11:45:46 [core.py:500]   File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward
      ERROR 06-26 11:45:46 [core.py:500]     from xformers import ops as xops
      ERROR 06-26 11:45:46 [core.py:500] ModuleNotFoundError: No module named 'xformers'
      Process EngineCore_0:
      Traceback (most recent call last):
        File "/usr/lib64/python3.12/multiprocessing/process.py", line 314, in _bootstrap
          self.run()
        File "/usr/lib64/python3.12/multiprocessing/process.py", line 108, in run
          self._target(*self._args, **self._kwargs)
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 504, in run_engine_core
          raise e
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 491, in run_engine_core
          engine_core = EngineCoreProc(*args, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 390, in __init__
          super().__init__(vllm_config, executor_class, log_stats,
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 78, in __init__
          self._initialize_kv_caches(vllm_config)
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core.py", line 137, in _initialize_kv_caches
          available_gpu_memory = self.model_executor.determine_available_memory()
                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/executor/abstract.py", line 75, in determine_available_memory
          output = self.collective_rpc("determine_available_memory")
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
          answer = run_method(self.driver_worker, method, args, kwargs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/utils.py", line 2605, in run_method
          return func(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
          return func(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_worker.py", line 185, in determine_available_memory
          self.model_runner.profile_run()
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1886, in profile_run
          dummy_encoder_outputs = self.model.get_multimodal_embeddings(
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 485, in get_multimodal_embeddings
          vision_embeddings = self._process_image_input(image_input)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 460, in _process_image_input
          patch_embeddings = self._process_image_pixels(image_input)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 437, in _process_image_pixels
          stacked_image_features = self._image_pixels_to_features(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/llava_next.py", line 348, in _image_pixels_to_features
          image_features = vision_tower(
                           ^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 478, in forward
          return self.vision_model(
                 ^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 429, in forward
          encoder_outputs = self.encoder(
                            ^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 318, in forward
          hidden_states, _ = encoder_layer(hidden_states)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 273, in forward
          hidden_states, _ = self.self_attn(hidden_states=hidden_states)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/model_executor/models/siglip.py", line 191, in forward
          out = self.attn(query_states, key_states, value_states)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
          return self._call_impl(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
          return forward_call(*args, **kwargs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/attention/layer.py", line 316, in forward
          from xformers import ops as xops
      ModuleNotFoundError: No module named 'xformers'
      Traceback (most recent call last):
        File "<frozen runpy>", line 198, in _run_module_as_main
        File "<frozen runpy>", line 88, in _run_code
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1376, in <module>
          uvloop.run(run_server(args))
        File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 109, in run
          return __asyncio.run(
                 ^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
          return runner.run(main)
                 ^^^^^^^^^^^^^^^^
        File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
          return self._loop.run_until_complete(task)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
        File "/opt/app-root/lib64/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
          return await main
                 ^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1324, in run_server
          async with build_async_engine_client(args) as engine_client:
        File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
          return await anext(self.gen)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 153, in build_async_engine_client
          async with build_async_engine_client_from_engine_args(
        File "/usr/lib64/python3.12/contextlib.py", line 210, in __aenter__
          return await anext(self.gen)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 185, in build_async_engine_client_from_engine_args
          async_llm = AsyncLLM.from_vllm_config(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 157, in from_vllm_config
          return cls(
                 ^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 123, in __init__
          self.engine_core = core_client_class(
                             ^^^^^^^^^^^^^^^^^^
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 734, in __init__
          super().__init__(
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 418, in __init__
          self._wait_for_engine_startup(output_address, parallel_config)
        File "/opt/app-root/lib64/python3.12/site-packages/vllm/v1/engine/core_client.py", line 484, in _wait_for_engine_startup
          raise RuntimeError("Engine core initialization failed. "
      RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
      
      

              cheimes@redhat.com Christian Heimes
              takumar@redhat.com Tarun Kumar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: