INFO 2024-11-22 20:14:55,311 instructlab.model.serve_backend:56: Using model '/home/cloud-user/.cache/instructlab/models/granite-8b-lab-v1' with -1 gpu-layers and 4096 max context size. INFO 2024-11-22 20:14:55,311 instructlab.model.serve_backend:88: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2024-11-22 20:14:55,313 instructlab.model.backends.vllm:313: vLLM starting up on pid 52 at http://127.0.0.1:8000/v1 INFO 11-22 20:15:00 api_server.py:526] vLLM API server version 0.6.2 INFO 11-22 20:15:00 api_server.py:527] args: Namespace(host='127.0.0.1', port=8000, uvicorn_log_level='info', allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template='/tmp/tmp7ty9nvzm', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_auto_tool_choice=False, tool_call_parser=None, model='/home/cloud-user/.cache/instructlab/models/granite-8b-lab-v1', tokenizer=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, download_dir=None, load_format='auto', config_format='auto', dtype='auto', kv_cache_dtype='auto', quantization_param_path=None, max_model_len=None, guided_decoding_backend='outlines', distributed_executor_backend='mp', worker_use_ray=False, pipeline_parallel_size=1, tensor_parallel_size=4, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=16, enable_prefix_caching=False, disable_sliding_window=False, use_v2_block_manager=False, num_lookahead_slots=0, seed=0, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_seqs=256, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, enforce_eager=False, max_context_len_to_capture=None, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, enable_lora=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=False, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=None, qlora_adapter_name_or_path=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, override_neuron_config=None, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False) INFO 11-22 20:15:00 api_server.py:164] Multiprocessing frontend to use ipc:///tmp/2d04d161-8541-405a-9216-2838d6db9413 for IPC Path. INFO 11-22 20:15:00 api_server.py:177] Started engine process with PID 56 INFO 11-22 20:15:00 config.py:1652] Downcasting torch.float32 to torch.float16. INFO 11-22 20:15:03 config.py:1652] Downcasting torch.float32 to torch.float16. INFO 11-22 20:15:03 llm_engine.py:226] Initializing an LLM engine (v0.6.2) with config: model='/home/cloud-user/.cache/instructlab/models/granite-8b-lab-v1', speculative_config=None, tokenizer='/home/cloud-user/.cache/instructlab/models/granite-8b-lab-v1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=4096, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=4, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=/home/cloud-user/.cache/instructlab/models/granite-8b-lab-v1, use_v2_block_manager=False, num_scheduler_steps=1, multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=True, mm_processor_kwargs=None) WARNING 11-22 20:15:03 multiproc_gpu_executor.py:53] Reducing Torch parallelism from 64 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 11-22 20:15:03 custom_cache_manager.py:17] Setting Triton cache manager to: vllm.triton_utils.custom_cache_manager:CustomCacheManager (VllmWorkerProcess pid=125) Process VllmWorkerProcess: (VllmWorkerProcess pid=123) Process VllmWorkerProcess: (VllmWorkerProcess pid=124) Process VllmWorkerProcess: (VllmWorkerProcess pid=123) Traceback (most recent call last): (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap (VllmWorkerProcess pid=123) self.run() (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run (VllmWorkerProcess pid=123) self._target(*self._args, **self._kwargs) (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 213, in _run_worker_process (VllmWorkerProcess pid=123) worker = worker_factory() (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/gpu_executor.py", line 24, in create_worker (VllmWorkerProcess pid=123) wrapper.init_worker(**kwargs) (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker_base.py", line 449, in init_worker (VllmWorkerProcess pid=123) self.worker = worker_class(*args, **kwargs) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 99, in __init__ (VllmWorkerProcess pid=123) self.model_runner: GPUModelRunnerBase = ModelRunnerClass( (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 977, in __init__ (VllmWorkerProcess pid=123) self.attn_backend = get_attn_backend( (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/selector.py", line 108, in get_attn_backend (VllmWorkerProcess pid=123) backend = which_attn_to_use(num_heads, head_size, num_kv_heads, (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/selector.py", line 248, in which_attn_to_use (VllmWorkerProcess pid=123) from vllm.attention.backends.flash_attn import ( # noqa: F401 (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/backends/flash_attn.py", line 31, in (VllmWorkerProcess pid=123) @torch.library.custom_op("vllm::flash_attn_varlen_func", mutates_args=[]) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 123, in inner (VllmWorkerProcess pid=123) result = CustomOpDef(namespace, opname, schema_str, fn) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 169, in __init__ (VllmWorkerProcess pid=123) self._register_to_dispatcher() (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 473, in _register_to_dispatcher (VllmWorkerProcess pid=123) lib._register_fake(self._name, fake_impl, _stacklevel=4) (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/torch/library.py", line 135, in _register_fake (VllmWorkerProcess pid=123) source = torch._library.utils.get_source(_stacklevel + 1) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/utils.py", line 42, in get_source (VllmWorkerProcess pid=123) frame = inspect.getframeinfo(sys._getframe(stacklevel)) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/inspect.py", line 1692, in getframeinfo (VllmWorkerProcess pid=123) lines, lnum = findsource(frame) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/inspect.py", line 1075, in findsource (VllmWorkerProcess pid=123) module = getmodule(object, file) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/inspect.py", line 998, in getmodule (VllmWorkerProcess pid=124) Traceback (most recent call last): (VllmWorkerProcess pid=123) f = getabsfile(module) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/inspect.py", line 967, in getabsfile (VllmWorkerProcess pid=123) _filename = getsourcefile(object) or getfile(object) (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "/usr/lib64/python3.11/inspect.py", line 949, in getsourcefile (VllmWorkerProcess pid=123) if os.path.exists(filename): (VllmWorkerProcess pid=123) ^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=123) File "", line 19, in exists (VllmWorkerProcess pid=123) File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 384, in signal_handler (VllmWorkerProcess pid=123) raise KeyboardInterrupt("MQLLMEngine terminated") (VllmWorkerProcess pid=123) KeyboardInterrupt: MQLLMEngine terminated (VllmWorkerProcess pid=124) File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap (VllmWorkerProcess pid=124) self.run() (VllmWorkerProcess pid=124) File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run (VllmWorkerProcess pid=124) self._target(*self._args, **self._kwargs) (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_worker_utils.py", line 213, in _run_worker_process (VllmWorkerProcess pid=124) worker = worker_factory() (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/gpu_executor.py", line 24, in create_worker (VllmWorkerProcess pid=124) wrapper.init_worker(**kwargs) (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker_base.py", line 449, in init_worker (VllmWorkerProcess pid=124) self.worker = worker_class(*args, **kwargs) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 99, in __init__ (VllmWorkerProcess pid=124) self.model_runner: GPUModelRunnerBase = ModelRunnerClass( (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/model_runner.py", line 977, in __init__ (VllmWorkerProcess pid=124) self.attn_backend = get_attn_backend( (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/selector.py", line 108, in get_attn_backend (VllmWorkerProcess pid=124) backend = which_attn_to_use(num_heads, head_size, num_kv_heads, (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/selector.py", line 248, in which_attn_to_use (VllmWorkerProcess pid=124) from vllm.attention.backends.flash_attn import ( # noqa: F401 (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/attention/backends/flash_attn.py", line 31, in (VllmWorkerProcess pid=124) @torch.library.custom_op("vllm::flash_attn_varlen_func", mutates_args=[]) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 123, in inner (VllmWorkerProcess pid=124) result = CustomOpDef(namespace, opname, schema_str, fn) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 169, in __init__ (VllmWorkerProcess pid=124) self._register_to_dispatcher() (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/custom_ops.py", line 473, in _register_to_dispatcher (VllmWorkerProcess pid=124) lib._register_fake(self._name, fake_impl, _stacklevel=4) (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/torch/library.py", line 135, in _register_fake (VllmWorkerProcess pid=124) source = torch._library.utils.get_source(_stacklevel + 1) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/torch/_library/utils.py", line 42, in get_source (VllmWorkerProcess pid=124) frame = inspect.getframeinfo(sys._getframe(stacklevel)) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/usr/lib64/python3.11/inspect.py", line 1692, in getframeinfo (VllmWorkerProcess pid=124) lines, lnum = findsource(frame) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/usr/lib64/python3.11/inspect.py", line 1075, in findsource (VllmWorkerProcess pid=124) module = getmodule(object, file) (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "/usr/lib64/python3.11/inspect.py", line 1001, in getmodule (VllmWorkerProcess pid=124) os.path.realpath(f)] = module.__name__ (VllmWorkerProcess pid=124) ^^^^^^^^^^^^^^^^^^^ (VllmWorkerProcess pid=124) File "", line 416, in realpath (VllmWorkerProcess pid=124) File "", line 451, in _joinrealpath (VllmWorkerProcess pid=124) File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 384, in signal_handler (VllmWorkerProcess pid=124) raise KeyboardInterrupt("MQLLMEngine terminated") (VllmWorkerProcess pid=124) KeyboardInterrupt: MQLLMEngine terminated ERROR 11-22 20:15:03 multiproc_worker_utils.py:120] Worker VllmWorkerProcess pid 124 died, exit code: 1 INFO 11-22 20:15:03 multiproc_worker_utils.py:124] Killing local vLLM worker processes Process SpawnProcess-1: Traceback (most recent call last): File "/usr/lib64/python3.11/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/lib64/python3.11/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 388, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 138, in from_engine_args return cls( ^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/engine.py", line 78, in __init__ self.engine = LLMEngine(*args, ^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/llm_engine.py", line 325, in __init__ self.model_executor = executor_class( ^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/distributed_gpu_executor.py", line 26, in __init__ super().__init__(*args, **kwargs) File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/executor_base.py", line 47, in __init__ self._init_executor() File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 110, in _init_executor self._run_workers("init_device") File "/opt/app-root/lib64/python3.11/site-packages/vllm/executor/multiproc_gpu_executor.py", line 185, in _run_workers driver_worker_output = driver_worker_method(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/worker/worker.py", line 166, in init_device torch.cuda.set_device(self.device) File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 420, in set_device torch._C._cuda_setDevice(device) File "/opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py", line 314, in _lazy_init torch._C._cuda_init() RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 571, in uvloop.run(run_server(args)) File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 538, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 105, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 192, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start ^CINFO 2024-11-22 20:15:24,152 instructlab.model.backends.vllm:82: vLLM server terminated by keyboard INFO 2024-11-22 20:15:24,153 instructlab.model.backends.vllm:475: Waiting for GPU VRAM reclamation... /opt/app-root/lib64/python3.11/site-packages/torch/cuda/__init__.py:128: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination (Triggered internally at /mount/work-dir/torch-2.4.1/torch-2.4.1/c10/cuda/CUDAFunctions.cpp:108.) return torch._C._cuda_getDeviceCount() > 0 ^CINFO 2024-11-22 20:15:31,420 instructlab.model.serve_backend:117: Server terminated by keyboard Traceback (most recent call last): File "/opt/app-root/bin/ilab", line 8, in sys.exit(ilab()) ^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 323, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/serve.py", line 104, in serve serve_backend( File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/serve_backend.py", line 119, in serve_backend backend_instance.shutdown() File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/vllm.py", line 194, in shutdown shutdown_process(self.process, 20) File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/vllm.py", line 442, in shutdown_process process_group_id = os.getpgid(process.pid) ^^^^^^^^^^^^^^^^^^^^^^^ ProcessLookupError: [Errno 3] No such process