[cloud-user@lab-mi300x ~]$ ilab model list +-----------------------------------+---------------------+---------+ | Model Name | Last Modified | Size | +-----------------------------------+---------------------+---------+ | models/granite-3.1-8b-lab-v1 | 2025-04-14 20:28:27 | 15.2 GB | | models/granite-3.1-8b-starter-v1 | 2025-04-14 20:29:29 | 15.2 GB | | models/mixtral-8x7b-instruct-v0-1 | 2025-04-14 20:43:04 | 87.0 GB | | models/prometheus-8x7b-v2-0 | 2025-04-14 20:43:17 | 87.0 GB | +-----------------------------------+---------------------+---------+ [cloud-user@lab-mi300x ~]$ ilab data generate --enable-serving-output INFO 2025-04-14 21:04:49,400 instructlab.process.process:241: Started subprocess with PID 1. Logs are being written to /var/home/cloud-user/.local/share/instructlab/logs/generation/generation-1a243206-1974-11f0-bfc1-6045bd01a29c.log. INFO 2025-04-14 21:04:50,266 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-04-14 21:04:51,546 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:56423/v1 INFO 2025-04-14 21:04:51,546 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:56423/v1 INFO 2025-04-14 21:04:51,546 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 1/120 INFO 2025-04-14 21:04:54,743 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 2/120 INFO 2025-04-14 21:04:58,022 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 3/120 INFO 2025-04-14 21:05:01,408 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 4/120 WARNING 04-14 21:05:02 rocm.py:34] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead. /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 04-14 21:05:03 api_server.py:643] vLLM API server version 0.6.4.post1 INFO 04-14 21:05:03 api_server.py:644] args: Namespace(host='127.0.0.1', port=56423, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=[LoRAModulePath(name='skill-classifier-v3-clm', path='/var/home/cloud-user/.cache/instructlab/models/skills-adapter-v3', base_model_name=None), LoRAModulePath(name='text-classifier-knowledge-v3-clm', path='/var/home/cloud-user/.cache/instructlab/models/knowledge-adapter-v3', base_model_name=None)], prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=, dtype='bfloat16', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend='mp', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, mm_cache_preprocessor=False, enable_lora=True, enable_lora_bias=False, max_loras=1, max_lora_rank=64, lora_extra_vocab_size=256, lora_dtype='bfloat16', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=True, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False) INFO 04-14 21:05:03 api_server.py:198] Started engine process with PID 25 INFO 2025-04-14 21:05:04,752 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 5/120 INFO 2025-04-14 21:05:08,159 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 6/120 /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 04-14 21:05:11 config.py:444] This model supports multiple tasks: {'reward', 'classify', 'embed', 'generate', 'score'}. Defaulting to 'generate'. INFO 2025-04-14 21:05:11,500 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 7/120 INFO 2025-04-14 21:05:14,843 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 8/120 INFO 04-14 21:05:15 config.py:444] This model supports multiple tasks: {'reward', 'classify', 'score', 'embed', 'generate'}. Defaulting to 'generate'. INFO 04-14 21:05:16 llm_engine.py:249] Initializing an LLM engine (v0.6.4.post1) with config: model='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', speculative_config=None, tokenizer='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, mm_cache_preprocessor=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 04-14 21:05:16 multiproc_worker_utils.py:312] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 04-14 21:05:16 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager INFO 04-14 21:05:16 selector.py:134] Using ROCmFlashAttention backend. INFO 2025-04-14 21:05:18,199 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 9/120 /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 2025-04-14 21:05:21,563 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 10/120 (VllmWorkerProcess pid=159) INFO 04-14 21:05:22 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=158) INFO 04-14 21:05:22 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=159) INFO 04-14 21:05:22 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=158) INFO 04-14 21:05:22 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=157) INFO 04-14 21:05:23 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=157) INFO 04-14 21:05:23 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=159) INFO 04-14 21:05:23 utils.py:1086] Found nccl from library librccl.so.1 INFO 04-14 21:05:23 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=159) INFO 04-14 21:05:23 pynccl.py:69] vLLM is using nccl==2.20.5 INFO 04-14 21:05:23 pynccl.py:69] vLLM is using nccl==2.20.5 (VllmWorkerProcess pid=157) INFO 04-14 21:05:23 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=157) INFO 04-14 21:05:23 pynccl.py:69] vLLM is using nccl==2.20.5 (VllmWorkerProcess pid=158) INFO 04-14 21:05:23 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=158) INFO 04-14 21:05:23 pynccl.py:69] vLLM is using nccl==2.20.5 INFO 04-14 21:05:24 shm_broadcast.py:255] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_8cae487d'), local_subscribe_port=60111, remote_subscribe_port=None) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=159) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] IndexError: tuple index out of range (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=158) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] IndexError: tuple index out of range (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=157) ERROR 04-14 21:05:24 multiproc_worker_utils.py:236] IndexError: tuple index out of range ERROR 04-14 21:05:24 engine.py:366] tuple index out of range ERROR 04-14 21:05:24 engine.py:366] Traceback (most recent call last): ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine ERROR 04-14 21:05:24 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args ERROR 04-14 21:05:24 engine.py:366] return cls(ipc_path=ipc_path, ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__ ERROR 04-14 21:05:24 engine.py:366] self.engine = LLMEngine(*args, **kwargs) ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__ ERROR 04-14 21:05:24 engine.py:366] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ ERROR 04-14 21:05:24 engine.py:366] super().__init__(*args, **kwargs) ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__ ERROR 04-14 21:05:24 engine.py:366] self._init_executor() ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 82, in _init_executor ERROR 04-14 21:05:24 engine.py:366] self._run_workers("init_device") ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in _run_workers ERROR 04-14 21:05:24 engine.py:366] ] + [output.get() for output in worker_outputs] ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in ERROR 04-14 21:05:24 engine.py:366] ] + [output.get() for output in worker_outputs] ERROR 04-14 21:05:24 engine.py:366] ^^^^^^^^^^^^ ERROR 04-14 21:05:24 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 61, in get ERROR 04-14 21:05:24 engine.py:366] raise self.result.exception ERROR 04-14 21:05:24 engine.py:366] IndexError: tuple index out of range ERROR 04-14 21:05:24 multiproc_worker_utils.py:123] Worker VllmWorkerProcess pid 158 died, exit code: -15 INFO 04-14 21:05:24 multiproc_worker_utils.py:127] Killing local vLLM worker processes Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine raise e File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ super().__init__(*args, **kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__ self._init_executor() File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 82, in _init_executor self._run_workers("init_device") File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in _run_workers ] + [output.get() for output in worker_outputs] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in ] + [output.get() for output in worker_outputs] ^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 61, in get raise self.result.exception IndexError: tuple index out of range INFO 2025-04-14 21:05:24,880 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 11/120 INFO 2025-04-14 21:05:28,230 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 12/120 INFO 2025-04-14 21:05:31,527 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:56423/v1, this might take a moment... Attempt: 13/120 Task exception was never retrieved future: exception=ZMQError('Operation not supported')> Traceback (most recent call last): File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/zmq/_future.py", line 372, in poll raise _zmq.ZMQError(_zmq.ENOTSUP) zmq.error.ZMQError: Operation not supported Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 701, in uvloop.run(run_server(args)) File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 667, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 117, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 222, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' INFO 2025-04-14 21:05:34,886 instructlab.model.backends.vllm:180: vLLM startup failed. Retrying (1/1) ERROR 2025-04-14 21:05:34,886 instructlab.model.backends.vllm:185: vLLM failed to start. INFO 2025-04-14 21:05:34,886 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-04-14 21:05:36,301 instructlab.model.backends.vllm:332: vLLM starting up on pid 280 at http://127.0.0.1:36389/v1 INFO 2025-04-14 21:05:36,301 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:36389/v1 INFO 2025-04-14 21:05:36,301 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 1/120 INFO 2025-04-14 21:05:39,642 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 2/120 WARNING 04-14 21:05:39 rocm.py:34] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead. /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 04-14 21:05:41 api_server.py:643] vLLM API server version 0.6.4.post1 INFO 04-14 21:05:41 api_server.py:644] args: Namespace(host='127.0.0.1', port=36389, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=[LoRAModulePath(name='skill-classifier-v3-clm', path='/var/home/cloud-user/.cache/instructlab/models/skills-adapter-v3', base_model_name=None), LoRAModulePath(name='text-classifier-knowledge-v3-clm', path='/var/home/cloud-user/.cache/instructlab/models/knowledge-adapter-v3', base_model_name=None)], prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', task='auto', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=, dtype='bfloat16', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='xgrammar', logits_processor_pattern=None, distributed_executor_backend='mp', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=True, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, mm_cache_preprocessor=False, enable_lora=True, enable_lora_bias=False, max_loras=1, max_lora_rank=64, lora_extra_vocab_size=256, lora_dtype='bfloat16', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=True, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', calculate_kv_scales=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False) INFO 04-14 21:05:41 api_server.py:198] Started engine process with PID 300 INFO 2025-04-14 21:05:42,991 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 3/120 /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 2025-04-14 21:05:46,247 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 4/120 INFO 04-14 21:05:47 config.py:444] This model supports multiple tasks: {'classify', 'score', 'reward', 'embed', 'generate'}. Defaulting to 'generate'. INFO 2025-04-14 21:05:49,456 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 5/120 INFO 04-14 21:05:52 config.py:444] This model supports multiple tasks: {'embed', 'generate', 'score', 'reward', 'classify'}. Defaulting to 'generate'. INFO 2025-04-14 21:05:52,824 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 6/120 INFO 04-14 21:05:53 llm_engine.py:249] Initializing an LLM engine (v0.6.4.post1) with config: model='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', speculative_config=None, tokenizer='/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=True, chunked_prefill_enabled=False, use_async_output_proc=True, mm_cache_preprocessor=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"candidate_compile_sizes":[],"compile_sizes":[],"capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 04-14 21:05:53 multiproc_worker_utils.py:312] Reducing Torch parallelism from 96 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 04-14 21:05:53 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager INFO 04-14 21:05:53 selector.py:134] Using ROCmFlashAttention backend. INFO 2025-04-14 21:05:56,022 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 7/120 /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION /opt/app-root/lib64/python3.11/site-packages/vllm/connections.py:8: RuntimeWarning: Failed to read commit hash: No module named 'vllm._version' from vllm.version import __version__ as VLLM_VERSION INFO 2025-04-14 21:05:59,358 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 8/120 (VllmWorkerProcess pid=434) INFO 04-14 21:05:59 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=433) INFO 04-14 21:05:59 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=434) INFO 04-14 21:05:59 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=433) INFO 04-14 21:05:59 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=432) INFO 04-14 21:06:00 selector.py:134] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=432) INFO 04-14 21:06:00 multiproc_worker_utils.py:222] Worker ready; awaiting tasks (VllmWorkerProcess pid=434) INFO 04-14 21:06:00 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=434) INFO 04-14 21:06:00 pynccl.py:69] vLLM is using nccl==2.20.5 INFO 04-14 21:06:00 utils.py:1086] Found nccl from library librccl.so.1 INFO 04-14 21:06:00 pynccl.py:69] vLLM is using nccl==2.20.5 (VllmWorkerProcess pid=433) INFO 04-14 21:06:00 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=433) INFO 04-14 21:06:00 pynccl.py:69] vLLM is using nccl==2.20.5 (VllmWorkerProcess pid=432) INFO 04-14 21:06:00 utils.py:1086] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=432) INFO 04-14 21:06:00 pynccl.py:69] vLLM is using nccl==2.20.5 INFO 04-14 21:06:01 shm_broadcast.py:255] vLLM message queue communication handle: Handle(connect_ip='127.0.0.1', local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_0b7d9f7c'), local_subscribe_port=48187, remote_subscribe_port=None) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=433) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] IndexError: tuple index out of range (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=432) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] IndexError: tuple index out of range (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Exception in worker VllmWorkerProcess while processing method init_device. (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] Traceback (most recent call last): (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 230, in _run_worker_process (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] output = executor(*args, **kwargs) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 180, in init_device (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] set_random_seed(self.model_config.seed) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/model_executor/utils.py", line 10, in set_random_seed (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] current_platform.seed_everything(seed) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/vllm/platforms/interface.py", line 187, in seed_everything (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.manual_seed(seed) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_compile.py", line 31, in inner (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return disable_fn(*args, **kwargs) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 600, in _fn (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] return fn(*args, **kwargs) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/random.py", line 46, in manual_seed (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] torch.cuda.manual_seed_all(seed) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 127, in manual_seed_all (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] _lazy_call(cb, seed_all=True) (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 244, in _lazy_call (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] callable() (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/random.py", line 124, in cb (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] default_generator = torch.cuda.default_generators[i] (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^ (VllmWorkerProcess pid=434) ERROR 04-14 21:06:01 multiproc_worker_utils.py:236] IndexError: tuple index out of range ERROR 04-14 21:06:01 engine.py:366] tuple index out of range ERROR 04-14 21:06:01 engine.py:366] Traceback (most recent call last): ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine ERROR 04-14 21:06:01 engine.py:366] engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args ERROR 04-14 21:06:01 engine.py:366] return cls(ipc_path=ipc_path, ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__ ERROR 04-14 21:06:01 engine.py:366] self.engine = LLMEngine(*args, **kwargs) ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__ ERROR 04-14 21:06:01 engine.py:366] self.model_executor = executor_class(vllm_config=vllm_config, ) ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ ERROR 04-14 21:06:01 engine.py:366] super().__init__(*args, **kwargs) ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__ ERROR 04-14 21:06:01 engine.py:366] self._init_executor() ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 82, in _init_executor ERROR 04-14 21:06:01 engine.py:366] self._run_workers("init_device") ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in _run_workers ERROR 04-14 21:06:01 engine.py:366] ] + [output.get() for output in worker_outputs] ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in ERROR 04-14 21:06:01 engine.py:366] ] + [output.get() for output in worker_outputs] ERROR 04-14 21:06:01 engine.py:366] ^^^^^^^^^^^^ ERROR 04-14 21:06:01 engine.py:366] File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 61, in get ERROR 04-14 21:06:01 engine.py:366] raise self.result.exception ERROR 04-14 21:06:01 engine.py:366] IndexError: tuple index out of range ERROR 04-14 21:06:01 multiproc_worker_utils.py:123] Worker VllmWorkerProcess pid 432 died, exit code: -15 INFO 04-14 21:06:01 multiproc_worker_utils.py:127] Killing local vLLM worker processes Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 368, in run_mp_engine raise e File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 357, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 119, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 71, in __init__ self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 288, in __init__ self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ super().__init__(*args, **kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 36, in __init__ self._init_executor() File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 82, in _init_executor self._run_workers("init_device") File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in _run_workers ] + [output.get() for output in worker_outputs] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 161, in ] + [output.get() for output in worker_outputs] ^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 61, in get raise self.result.exception IndexError: tuple index out of range INFO 2025-04-14 21:06:02,608 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 9/120 INFO 2025-04-14 21:06:05,926 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 10/120 Task exception was never retrieved future: exception=ZMQError('Operation not supported')> Traceback (most recent call last): File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 184, in run_output_handler_loop while await self.output_socket.poll(timeout=VLLM_RPC_TIMEOUT ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/zmq/_future.py", line 372, in poll raise _zmq.ZMQError(_zmq.ENOTSUP) zmq.error.ZMQError: Operation not supported Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 701, in uvloop.run(run_server(args)) File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 667, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 117, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 222, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause. INFO 2025-04-14 21:06:09,197 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:36389/v1, this might take a moment... Attempt: 11/120 /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' failed to generate data with exception: Failed to start server: vLLM failed to start.