$ ilab model serve INFO 2025-05-06 13:37:31,699 instructlab.model.serve_backend:79: Setting backend_type in the serve config to vllm INFO 2025-05-06 13:37:31,717 instructlab.model.serve_backend:85: Using model '/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2' with -1 gpu-layers and 4096 max context size. INFO 2025-05-06 13:37:31,835 instructlab.model.serve_backend:127: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2025-05-06 13:37:32,106 instructlab.model.backends.vllm:332: vLLM starting up on pid 83 at http://127.0.0.1:8000/v1 INFO 05-06 13:37:44 [__init__.py:239] Automatically detected platform cuda. INFO 05-06 13:37:46 [api_server.py:1034] vLLM API server version 0.8.4 INFO 05-06 13:37:46 [api_server.py:1035] args: Namespace(host='127.0.0.1', port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template='/tmp/tmp5gxhizcg', chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config=None, use_tqdm_on_load=True, config_format=, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='auto', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend='mp', pipeline_parallel_size=1, tensor_parallel_size=8, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', 'granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-starter-v2', 'models/mixtral-8x7b-instruct-v0-1', 'models/prometheus-8x7b-v2-0'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_chunked_mm_input=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False) INFO 05-06 13:37:55 [config.py:689] This model supports multiple tasks: {'reward', 'generate', 'embed', 'score', 'classify'}. Defaulting to 'generate'. INFO 05-06 13:37:55 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=8192. INFO 05-06 13:38:01 [__init__.py:239] Automatically detected platform cuda. INFO 05-06 13:38:03 [core.py:61] Initializing a V1 LLM engine (v0.8.4) with config: model='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', speculative_config=None, tokenizer='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=True, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"level":3,"custom_ops":["none"],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output"],"use_inductor":true,"compile_sizes":[],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":512} WARNING 05-06 13:38:03 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-06 13:38:03 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0, 1, 2, 3, 4, 5, 6, 7], buffer_handle=(8, 10485760, 10, 'psm_41357e8f'), local_subscribe_addr='ipc:///tmp/98015b41-8e98-430f-8606-21f1f7294292', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:07 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:11 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=0 pid=151) INFO 05-06 13:38:11 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_55757610'), local_subscribe_addr='ipc:///tmp/f3e613e3-3fc4-459e-b4e3-ddd9c99d3fdf', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:16 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:19 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=1 pid=165) INFO 05-06 13:38:19 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_a860dc3d'), local_subscribe_addr='ipc:///tmp/6a0da149-7980-49b5-a379-2b70e6b639e8', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:23 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:26 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=2 pid=181) INFO 05-06 13:38:26 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_f751c11e'), local_subscribe_addr='ipc:///tmp/7f99e35e-06f2-4310-9986-0295083ee610', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:31 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:34 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=3 pid=201) INFO 05-06 13:38:34 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_28386058'), local_subscribe_addr='ipc:///tmp/f92d4a77-a2dc-4828-8c91-14d63a012aeb', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:38 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:41 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=4 pid=221) INFO 05-06 13:38:41 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_9d6e89d1'), local_subscribe_addr='ipc:///tmp/3ee99e8b-c9e0-408b-9317-29ff9a77e1a9', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:45 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:48 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=5 pid=241) INFO 05-06 13:38:48 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_bf10fd86'), local_subscribe_addr='ipc:///tmp/8ac736df-c1f7-4c99-a7fc-177e5945d467', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:38:53 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:38:56 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=6 pid=261) INFO 05-06 13:38:56 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_38ff9c1a'), local_subscribe_addr='ipc:///tmp/37dc40b4-ec87-42cd-b9b8-15430e95b55a', remote_subscribe_addr=None, remote_addr_ipv6=False) INFO 05-06 13:39:00 [__init__.py:239] Automatically detected platform cuda. WARNING 05-06 13:39:03 [utils.py:2444] Methods determine_num_available_blocks,device_config,get_cache_block_size_bytes,initialize_cache not implemented in (VllmWorker rank=7 pid=281) INFO 05-06 13:39:03 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[0], buffer_handle=(1, 10485760, 10, 'psm_ffd66eda'), local_subscribe_addr='ipc:///tmp/0aa54852-099c-4d89-aa58-cca95ff2c2e6', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorker rank=4 pid=221) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=1 pid=165) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=5 pid=241) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=3 pid=201) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=7 pid=281) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=1 pid=165) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=4 pid=221) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=5 pid=241) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=3 pid=201) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=7 pid=281) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=2 pid=181) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=0 pid=151) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=6 pid=261) INFO 05-06 13:39:04 [utils.py:993] Found nccl from library libnccl.so.2 (VllmWorker rank=2 pid=181) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=0 pid=151) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=6 pid=261) INFO 05-06 13:39:04 [pynccl.py:69] vLLM is using nccl==2.26.2 (VllmWorker rank=5 pid=241) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=7 pid=281) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=6 pid=261) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=4 pid=221) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=1 pid=165) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=3 pid=201) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=2 pid=181) INFO 05-06 13:39:06 [custom_all_reduce_utils.py:244] reading GPU P2P access cache from /var/home/cloud-user/.cache/vllm/gpu_p2p_access_cache_for_0,1,2,3,4,5,6,7.json (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_bcdc9bd7'), local_subscribe_addr='ipc:///tmp/ed2f25fc-7585-49c2-bbe3-47eeb68735ee', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorker rank=7 pid=281) INFO 05-06 13:39:06 [parallel_state.py:959] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7 (VllmWorker rank=6 pid=261) INFO 05-06 13:39:06 [parallel_state.py:959] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6 (VllmWorker rank=4 pid=221) INFO 05-06 13:39:06 [parallel_state.py:959] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4 (VllmWorker rank=3 pid=201) INFO 05-06 13:39:06 [parallel_state.py:959] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3 (VllmWorker rank=2 pid=181) INFO 05-06 13:39:06 [parallel_state.py:959] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2 (VllmWorker rank=1 pid=165) INFO 05-06 13:39:06 [parallel_state.py:959] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1 (VllmWorker rank=5 pid=241) INFO 05-06 13:39:06 [parallel_state.py:959] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5 (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [parallel_state.py:959] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0 (VllmWorker rank=7 pid=281) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=4 pid=221) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=6 pid=261) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=1 pid=165) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=2 pid=181) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=5 pid=241) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=3 pid=201) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [cuda.py:221] Using Flash Attention backend on V1 engine. (VllmWorker rank=1 pid=165) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=5 pid=241) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=4 pid=221) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=3 pid=201) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=7 pid=281) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=2 pid=181) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=6 pid=261) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [gpu_model_runner.py:1276] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorker rank=4 pid=221) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=1 pid=165) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=5 pid=241) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=3 pid=201) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=7 pid=281) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=2 pid=181) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=6 pid=261) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. (VllmWorker rank=0 pid=151) INFO 05-06 13:39:06 [topk_topp_sampler.py:44] Currently, FlashInfer top-p & top-k sampling sampler is disabled because FlashInfer>=v0.2.3 is not backward compatible. Falling back to the PyTorch-native implementation of top-p & top-k sampling. Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00= tmp1 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 12304, tl.int32) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 6152, tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 1, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=1 pid=165) [rank1]:E0506 13:39:19.294000 165 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 36912, tl.int32) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 43064, tl.int32) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 36912, tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 6, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=6 pid=261) [rank6]:E0506 13:39:19.294000 261 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 43064, tl.int32) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 49160, tl.int32) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tmp0 >= tmp3 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp6 & tmp4 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp5 | tmp7 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp8 == 0 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp8.to(tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp0.to(tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp5.to(tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tl.full([1], 43064, tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp12 * tmp13 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tmp7.to(tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tl.full([1], 43008, tl.int64) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp15 * tmp16 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tmp14 + tmp17 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp11 - tmp18 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp10 * tmp19 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp20 + tmp21 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tmp20 < 0 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tl.where(tmp23, tmp22, tmp20) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp24) & (tmp24 < 6152), "index out of bounds: 0 <= tmp24 < 6152") (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.load(in_ptr1 + (x0 + 4096*tmp24), None).to(tl.float32) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp27 = 0.0 (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.where(tmp9, tmp27, tmp26) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp28, None) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 7, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=7 pid=281) [rank7]:E0506 13:39:19.295000 281 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 18456, tl.int32) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 30760, tl.int32) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 24608, tl.int32) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 36912, tl.int32) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 18456, tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 30760, tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 3, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 5, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=3 pid=201) [rank3]:E0506 13:39:19.296000 201 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=5 pid=241) [rank5]:E0506 13:39:19.296000 241 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 12304, tl.int32) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 18456, tl.int32) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 12304, tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 2, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=2 pid=181) [rank2]:E0506 13:39:19.297000 181 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 24608, tl.int32) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 30760, tl.int32) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 24608, tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 4, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=4 pid=221) [rank4]:E0506 13:39:19.301000 221 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Triton compilation failed: triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] def triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xoffset = tl.program_id(0) * XBLOCK (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xindex = xoffset + tl.arange(0, XBLOCK)[:] (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] xmask = tl.full([XBLOCK], True, tl.int1) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x1 = xindex // 4096 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x0 = (xindex % 4096) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] x2 = xindex (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp0 = tl.load(in_ptr0 + (x1), None, eviction_policy='evict_last') (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp1 = tl.full([1], 0, tl.int32) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp2 = tmp0 >= tmp1 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp3 = tl.full([1], 6152, tl.int32) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp4 = tmp0 < tmp3 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp5 = tmp2 & tmp4 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp6 = tl.full([1], 49160, tl.int32) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp7 = tmp0 >= tmp6 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp8 = tmp0 < tmp6 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp9 = tmp7 & tmp8 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp10 = tmp5 | tmp9 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp11 = tmp10 == 0 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp12 = tmp10.to(tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp13 = tmp0.to(tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp14 = tmp5.to(tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp15 = tl.full([1], 0, tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp16 = tmp14 * tmp15 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp17 = tmp9.to(tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp18 = tl.full([1], 43008, tl.int64) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp19 = tmp17 * tmp18 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp20 = tmp16 + tmp19 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp21 = tmp13 - tmp20 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp22 = tmp12 * tmp21 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp23 = tl.full([XBLOCK], 6152, tl.int32) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp24 = tmp22 + tmp23 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp25 = tmp22 < 0 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp26 = tl.where(tmp25, tmp24, tmp22) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.device_assert((0 <= tmp26) & (tmp26 < 6152), "index out of bounds: 0 <= tmp26 < 6152") (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp28 = tl.load(in_ptr1 + (x0 + 4096*tmp26), None).to(tl.float32) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp29 = 0.0 (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tmp30 = tl.where(tmp11, tmp29, tmp28) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] tl.store(out_ptr0 + (x2), tmp30, None) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] metadata: {'signature': {'in_ptr0': '*i32', 'in_ptr1': '*bf16', 'out_ptr0': '*bf16', 'xnumel': 'i32'}, 'device': 0, 'constants': {'XBLOCK': 1024}, 'configs': [AttrsDescriptor.from_dict({'arg_properties': {'tt.divisibility': (0, 1, 2, 3), 'tt.equal_to': ()}, 'cls': 'AttrsDescriptor'})], 'device_type': 'cuda', 'num_warps': 4, 'num_stages': 1, 'debug': True, 'cc': 90} (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] next_module = compile_ir(module, metadata) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_1_0/inductor_cache/hk/chkduqcye5vtyxb3cm33cbwx3nkycpq22poxdnhfh6jz54mtuwtm.py", line 77, in (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=0 pid=151) [rank0]:E0506 13:39:19.322000 151 opt/app-root/lib/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py:513] [0/0] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_6_0/inductor_cache/oh/coh5ct5plz23iy6tfvamvicfpchdjsubswiyj2cygvqdfjbzjwuf.py", line 77, in (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_7_0/inductor_cache/35/c35mdr555sopgpgb6u5q7jri5nk3ydwhecrdtqkhuhnruhfw4f4o.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_1_0/inductor_cache/hk/chkduqcye5vtyxb3cm33cbwx3nkycpq22poxdnhfh6jz54mtuwtm.py", line 77, in (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=1 pid=165) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] WorkerProc hit an exception: %s (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_2_0/inductor_cache/jh/cjhxxli2ajtaxt5jbbzmp35uhkrvkdpazvtdo3jsxa4m2bralhx3.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_3_0/inductor_cache/eq/ceq7b64wvtfb34ww4btsedqy55ahpydwzh65l4b43gijl2vylx7d.py", line 77, in (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_6_0/inductor_cache/oh/coh5ct5plz23iy6tfvamvicfpchdjsubswiyj2cygvqdfjbzjwuf.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_5_0/inductor_cache/b3/cb3tcry537opvrig5dogrwcrn6cjlszgnrn5vg63bfvroya5c7r3.py", line 77, in (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=6 pid=261) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_4_0/inductor_cache/r4/cr4b7nns4mgz4snnbfvlx4wmdnjkoap7rm43d553gbex57epooih.py", line 77, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_7_0/inductor_cache/35/c35mdr555sopgpgb6u5q7jri5nk3ydwhecrdtqkhuhnruhfw4f4o.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_0_0/inductor_cache/d6/cd67hr5viy4pfcumn5fu4da6ukuppgy7hplnawpemgm3gmxs4khr.py", line 77, in (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=7 pid=281) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Traceback (most recent call last): (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 375, in worker_busy_loop (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = func(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_worker.py", line 157, in determine_available_memory (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.model_runner.profile_run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1591, in profile_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = self._dummy_run(self.max_num_tokens) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/worker/gpu_model_runner.py", line 1441, in _dummy_run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] hidden_states = model( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_impl(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return forward_call(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/models/granite.py", line 456, in forward (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] model_output = self.model(input_ids, positions, intermediate_tensors, (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/decorators.py", line 238, in __call__ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_2_0/inductor_cache/jh/cjhxxli2ajtaxt5jbbzmp35uhkrvkdpazvtdo3jsxa4m2bralhx3.py", line 77, in (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = self.compiled_callable(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 574, in _fn (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 1380, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._torchdynamo_orig_callable( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 547, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 986, in _compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] guarded_code = compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 715, in compile_inner (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _compile_inner(code, one_graph, hooks, transform) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_3_0/inductor_cache/eq/ceq7b64wvtfb34ww4btsedqy55ahpydwzh65l4b43gijl2vylx7d.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_utils_internal.py", line 95, in wrapper_function (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return function(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 750, in _compile_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] out_code = transform_code_object(code, transform) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1361, in transform_code_object (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] transformations(instructions, code_options) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 231, in _fn (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return fn(*args, **kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 662, in transform (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] tracer.run() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2868, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] super().run() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1052, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] while self.step(): (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 962, in step (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.dispatch_table[inst.opcode](self, inst) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=2 pid=181) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3048, in RETURN_VALUE (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self._return(inst) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 3033, in _return (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.output.compile_subgraph( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1101, in compile_subgraph (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.compile_and_call_fx_graph( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1382, in compile_and_call_fx_graph (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = self.call_user_compiler(gm) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1432, in call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._call_user_compiler(gm) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1483, in _call_user_compiler (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise BackendCompilerFailed(self.compiler_fn, e).with_traceback( (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/output_graph.py", line 1462, in _call_user_compiler (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = compiler_fn(gm, self.example_inputs()) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=3 pid=201) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_dynamo.py", line 130, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_gm = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/__init__.py", line 2385, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(model_, inputs_, **self.kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 455, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] PiecewiseCompileInterpreter(self.split_gm, submod_names_to_compile, (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_5_0/inductor_cache/b3/cb3tcry537opvrig5dogrwcrn6cjlszgnrn5vg63bfvroya5c7r3.py", line 77, in (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 245, in run (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return super().run(*fake_args) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 167, in run (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] self.env[node] = self.run_node(node) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/fx/interpreter.py", line 230, in run_node (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return getattr(self, n.op)(n.target, args, kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 261, in call_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiler_manager.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/backends.py", line 121, in compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph, handle = self.compiler.compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 293, in compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_graph = compile_fx( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1552, in compile_fx (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return compile_fx( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1863, in compile_fx (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return aot_autograd( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/backends/common.py", line 83, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] cg = aot_module_simplified(gm, example_inputs, **self.kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1155, in aot_module_simplified (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = dispatch_and_compile() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 1131, in dispatch_and_compile (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, _ = create_aot_dispatcher_function( (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 580, in create_aot_dispatcher_function (VllmWorker rank=5 pid=241) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return _create_aot_dispatcher_function( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 830, in _create_aot_dispatcher_function (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn, fw_metadata = compiler_fn( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_4_0/inductor_cache/r4/cr4b7nns4mgz4snnbfvlx4wmdnjkoap7rm43d553gbex57epooih.py", line 77, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 203, in aot_dispatch_base (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fw = compiler(fw_module, updated_flat_args) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_functorch/aot_autograd.py", line 489, in __call__ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self.compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1741, in fw_compiler_base (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return inner_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/usr/lib64/python3.11/contextlib.py", line 81, in inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return func(*args, **kwds) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/vllm/compilation/compiler_interface.py", line 228, in hijacked_compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] output = torch._inductor.compile_fx.compile_fx_inner( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 569, in compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/repro/after_aot.py", line 102, in debug_wrapper (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] inner_compiled_fn = compiler_fn(gm, example_inputs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 685, in _compile_fx_inner (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mb_compiled_graph = fx_codegen_and_compile( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1129, in fx_codegen_and_compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/compile_fx.py", line 1044, in codegen_and_compile (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_fn = graph.compile_to_module().call (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2027, in compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] return self._compile_to_module() (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/graph.py", line 2068, in _compile_to_module (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = PyCodeCache.load_by_key_path( (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=4 pid=221) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/codecache.py", line 2759, in load_by_key_path (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] mod = _reload_python_module(key, path) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/compile_tasks.py", line 45, in _reload_python_module (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] exec(code, mod.__dict__, mod.__dict__) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/var/home/cloud-user/.cache/vllm/torch_compile_cache/218b17c62f/rank_0_0/inductor_cache/d6/cd67hr5viy4pfcumn5fu4da6ukuppgy7hplnawpemgm3gmxs4khr.py", line 77, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0 = async_compile.triton('triton_poi_fused_add_all_reduce_bitwise_and_bitwise_or_embedding_ge_lt_masked_fill_mul_sub_0', ''' (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/async_compile.py", line 213, in triton (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] kernel.precompile() (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 293, in precompile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] compiled_binary, launcher = self._precompile_config( (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/torch/_inductor/runtime/triton_heuristics.py", line 511, in _precompile_config (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] binary = triton.compile(*compile_args, **compile_kwargs) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/compiler/compiler.py", line 279, in compile (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] next_module = compile_ir(module, metadata) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 391, in (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] stages["llir"] = lambda src, metadata: self.make_llir(src, metadata, options, self.capability) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 262, in make_llir (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = get_ptx_version_from_options(options) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 71, in get_ptx_version_from_options (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ptx_version = ptx_get_version(cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] File "/opt/app-root/lib64/python3.11/site-packages/triton/backends/nvidia/compiler.py", line 64, in ptx_get_version (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] raise RuntimeError("Triton only support CUDA 10.0 or higher, but got CUDA version: " + cuda_version) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.exc.BackendCompilerFailed: backend='' raised: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] RuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8 (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] While executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {}) (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Original traceback: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] None (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] You can suppress this exception and fall back to eager by setting: (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] import torch._dynamo (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] torch._dynamo.config.suppress_errors = True (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] (VllmWorker rank=0 pid=151) ERROR 05-06 13:39:19 [multiproc_executor.py:380] ERROR 05-06 13:39:19 [core.py:387] EngineCore hit an exception: Traceback (most recent call last): ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 378, in run_engine_core ERROR 05-06 13:39:19 [core.py:387] engine_core = EngineCoreProc(*args, **kwargs) ERROR 05-06 13:39:19 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 320, in __init__ ERROR 05-06 13:39:19 [core.py:387] super().__init__(vllm_config, executor_class, log_stats) ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 71, in __init__ ERROR 05-06 13:39:19 [core.py:387] self._initialize_kv_caches(vllm_config) ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/engine/core.py", line 133, in _initialize_kv_caches ERROR 05-06 13:39:19 [core.py:387] available_gpu_memory = self.model_executor.determine_available_memory() ERROR 05-06 13:39:19 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/abstract.py", line 66, in determine_available_memory ERROR 05-06 13:39:19 [core.py:387] output = self.collective_rpc("determine_available_memory") ERROR 05-06 13:39:19 [core.py:387] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 133, in collective_rpc ERROR 05-06 13:39:19 [core.py:387] raise e ERROR 05-06 13:39:19 [core.py:387] File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 122, in collective_rpc ERROR 05-06 13:39:19 [core.py:387] raise RuntimeError( ERROR 05-06 13:39:19 [core.py:387] RuntimeError: ('Worker failed with error %s, please check the stack trace above for the root cause', 'backend=\'\' raised:\nRuntimeError: Triton only support CUDA 10.0 or higher, but got CUDA version: 12.8\n\nWhile executing %submod_0 : [num_users=5] = call_module[target=submod_0](args = (%l_input_ids_, %s0, %l_self_modules_embed_tokens_parameters_weight_, %l_self_modules_layers_modules_0_modules_input_layernorm_parameters_weight_, %l_self_modules_layers_modules_0_modules_self_attn_modules_qkv_proj_parameters_weight_, %l_positions_, %l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_), kwargs = {})\nOriginal traceback:\nNone\n\nSet TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information\n\n\nYou can suppress this exception and fall back to eager by setting:\n import torch._dynamo\n torch._dynamo.config.suppress_errors = True\n') ERROR 05-06 13:39:19 [core.py:387] CRITICAL 05-06 13:39:19 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue. (VllmWorker rank=0 pid=151) Exception ignored in: . at 0x7fe9023b7b00> (VllmWorker rank=0 pid=151) Traceback (most recent call last): (VllmWorker rank=0 pid=151) File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/utils.py", line 589, in (VllmWorker rank=0 pid=151) self.refs[idx] = weakref.ref(key, lambda ref: self._remove_id(idx)) (VllmWorker rank=0 pid=151) (VllmWorker rank=0 pid=151) File "/opt/app-root/lib64/python3.11/site-packages/vllm/v1/executor/multiproc_executor.py", line 308, in signal_handler (VllmWorker rank=0 pid=151) raise SystemExit() (VllmWorker rank=0 pid=151) SystemExit: (VllmWorker rank=5 pid=241) Exception ignored in: .remove at 0x7ff308ae2840> (VllmWorker rank=5 pid=241) Traceback (most recent call last): (VllmWorker rank=5 pid=241) File "/opt/app-root/lib64/python3.11/site-packages/torch/utils/weak.py", line 125, in remove ^CINFO 2025-05-06 13:41:00,510 instructlab.model.backends.vllm:85: vLLM server terminated by keyboard INFO 2025-05-06 13:41:00,511 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation...