INFO 2025-05-16 19:03:22,088 instructlab.model.download:302: Available models (`ilab model list`): +-----------------------------------+---------------------+---------+---------------------------------------------------------------------------+ | Model Name | Last Modified | Size | Absolute path | +-----------------------------------+---------------------+---------+---------------------------------------------------------------------------+ | models/granite-3.1-8b-lab-v2 | 2025-05-16 17:52:05 | 15.6 GB | /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2 | | models/granite-3.1-8b-starter-v2 | 2025-05-16 17:58:25 | 15.6 GB | /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-starter-v2 | | models/mixtral-8x7b-instruct-v0-1 | 2025-05-16 18:31:11 | 87.0 GB | /var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 | | models/prometheus-8x7b-v2-0 | 2025-05-16 19:03:22 | 87.0 GB | /var/home/cloud-user/.cache/instructlab/models/prometheus-8x7b-v2-0 | +-----------------------------------+---------------------+---------+---------------------------------------------------------------------------+ + ilab taxonomy diff compositional_skills/grounded/linguistics/inclusion/qna.yaml compositional_skills/grounded/linguistics/writing/rewriting/qna.yaml compositional_skills/linguistics/synonyms/qna.yaml knowledge/arts/music/fandom/swifties/qna.yaml knowledge/science/animals/birds/black_capped_chickadee/qna.yaml Taxonomy in /var/home/cloud-user/.local/share/instructlab/taxonomy is valid :) + ilab model serve INFO 2025-05-16 19:03:38,224 instructlab.model.serve_backend:80: Setting backend_type in the serve config to vllm INFO 2025-05-16 19:03:38,241 instructlab.model.serve_backend:86: Using model '/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2' with -1 gpu-layers and 4096 max context size. INFO 2025-05-16 19:03:43,773 instructlab.model.serve_backend:133: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2025-05-16 19:03:44,059 instructlab.model.backends.vllm:332: vLLM starting up on pid 6 at http://127.0.0.1:8000/v1 INFO 05-16 19:04:08 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:11 [api_server.py:1034] vLLM API server version 0.8.4 INFO 05-16 19:04:11 [api_server.py:1035] args: Namespace(host='127.0.0.1', port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template='/tmp/tmpgur8m2cd', chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config=None, use_tqdm_on_load=True, config_format=, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='auto', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend='mp', pipeline_parallel_size=1, tensor_parallel_size=8, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', 'granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-starter-v2', 'models/mixtral-8x7b-instruct-v0-1', 'models/prometheus-8x7b-v2-0'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_chunked_mm_input=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False) INFO 05-16 19:04:32 [config.py:689] This model supports multiple tasks: {'generate', 'reward', 'score', 'embed', 'classify'}. Defaulting to 'generate'. INFO 05-16 19:04:32 [arg_utils.py:1742] rocm is experimental on VLLM_USE_V1=1. Falling back to V0 Engine. WARNING 05-16 19:04:32 [arg_utils.py:1603] The model has a long context length (131072). This may causeOOM during the initial memory profiling phase, or result in low performance due to small KV cache size. Consider setting --max-model-len to a smaller value. INFO 05-16 19:04:46 [api_server.py:246] Started engine process with PID 59 INFO 05-16 19:04:50 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:51 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', speculative_config=None, tokenizer='/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 05-16 19:04:51 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 104 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-16 19:04:59 [__init__.py:239] Automatically detected platform rocm. (VllmWorkerProcess pid=86) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=85) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=87) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=84) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=82) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=83) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=81) INFO 05-16 19:05:01 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks INFO 05-16 19:05:38 [rocm.py:153] None is not supported in AMD GPUs. INFO 05-16 19:05:38 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=85) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=85) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=83) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=83) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=81) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=81) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=84) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=84) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=87) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=87) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=86) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=86) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=82) INFO 05-16 19:06:57 [rocm.py:153] None is not supported in AMD GPUs. (VllmWorkerProcess pid=82) INFO 05-16 19:06:57 [rocm.py:154] Using ROCmFlashAttention backend. (VllmWorkerProcess pid=81) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=81) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=84) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=84) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=85) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=87) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=83) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=85) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=82) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=86) INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=87) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=83) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 INFO 05-16 19:06:59 [utils.py:993] Found nccl from library librccl.so.1 (VllmWorkerProcess pid=82) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 (VllmWorkerProcess pid=86) INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 INFO 05-16 19:06:59 [pynccl.py:69] vLLM is using nccl==2.21.5 INFO 05-16 19:07:01 [shm_broadcast.py:264] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3, 4, 5, 6, 7], buffer_handle=(7, 4194304, 6, 'psm_08064243'), local_subscribe_addr='ipc:///tmp/b3ecc60a-0a4b-49e6-b9d3-189acf87bd68', remote_subscribe_addr=None, remote_addr_ipv6=False) (VllmWorkerProcess pid=85) INFO 05-16 19:07:01 [parallel_state.py:959] rank 5 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 5 (VllmWorkerProcess pid=83) INFO 05-16 19:07:01 [parallel_state.py:959] rank 3 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 3 INFO 05-16 19:07:01 [parallel_state.py:959] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 0 (VllmWorkerProcess pid=86) INFO 05-16 19:07:01 [parallel_state.py:959] rank 6 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 6 (VllmWorkerProcess pid=84) INFO 05-16 19:07:01 [parallel_state.py:959] rank 4 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 4 (VllmWorkerProcess pid=87) INFO 05-16 19:07:01 [parallel_state.py:959] rank 7 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 7 (VllmWorkerProcess pid=82) INFO 05-16 19:07:01 [parallel_state.py:959] rank 2 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 2 (VllmWorkerProcess pid=81) INFO 05-16 19:07:01 [parallel_state.py:959] rank 1 in world size 8 is assigned as DP rank 0, PP rank 0, TP rank 1 (VllmWorkerProcess pid=83) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=81) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=85) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=87) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=82) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=84) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... (VllmWorkerProcess pid=86) INFO 05-16 19:07:01 [model_runner.py:1110] Starting to load model /var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2... Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00" %} WARNING 05-16 19:07:59 [api_server.py:936] {% set bos_token = "<|end_of_text|>" %} WARNING 05-16 19:07:59 [api_server.py:936] {%- if messages[0]['role'] == 'system' %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = messages[0]['content'] %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set loop_messages = messages[1:] %} WARNING 05-16 19:07:59 [api_server.py:936] {%- else %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = "Knowledge Cutoff Date: April 2024. WARNING 05-16 19:07:59 [api_server.py:936] Today's Date: " + strftime_now('%B %d, %Y') + ". WARNING 05-16 19:07:59 [api_server.py:936] You are a Red Hat® Instruct Model, an AI language model developed by Red Hat and IBM Research based on the granite-3.1-8b-base model." %} WARNING 05-16 19:07:59 [api_server.py:936] {%- if tools and documents %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + " You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request. WARNING 05-16 19:07:59 [api_server.py:936] WARNING 05-16 19:07:59 [api_server.py:936] Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data." %} WARNING 05-16 19:07:59 [api_server.py:936] {%- elif tools %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + " You are a helpful AI assistant with access to the following tools. When a tool is required to answer the user's query, respond with <|tool_call|> followed by a JSON list of tools used. If a tool does not exist in the provided list of tools, notify the user that you do not have the ability to fulfill the request." %} WARNING 05-16 19:07:59 [api_server.py:936] {%- elif documents %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + " Write the response to the user's input by strictly aligning with the facts in the provided documents. If the information needed to answer the question is not available in the documents, inform the user that the question cannot be answered based on the available data." %} WARNING 05-16 19:07:59 [api_server.py:936] {%- else %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + " Your primary role is to serve as a chat assistant." %} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- if 'citations' in controls and documents %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + ' WARNING 05-16 19:07:59 [api_server.py:936] WARNING 05-16 19:07:59 [api_server.py:936] In your response, use the symbols and to indicate when a fact comes from a document in the search result, e.g 0 for a fact from document 0. Afterwards, list all the citations with their corresponding documents in an ordered list.' %} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- if 'hallucinations' in controls and documents %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set system_message = system_message + ' WARNING 05-16 19:07:59 [api_server.py:936] WARNING 05-16 19:07:59 [api_server.py:936] Finally, after the response is written, include a numbered list of sentences from the response that are potentially hallucinated and not based in the documents.' %} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- set loop_messages = messages %} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|start_of_role|>system<|end_of_role|>' + system_message + '<|end_of_text|> WARNING 05-16 19:07:59 [api_server.py:936] ' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- if tools %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|start_of_role|>tools<|end_of_role|>' }} WARNING 05-16 19:07:59 [api_server.py:936] {{- tools | tojson(indent=4) }} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|end_of_text|> WARNING 05-16 19:07:59 [api_server.py:936] ' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- if documents %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|start_of_role|>documents<|end_of_role|>' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- for document in documents %} WARNING 05-16 19:07:59 [api_server.py:936] {{- 'Document ' + loop.index0 | string + ' WARNING 05-16 19:07:59 [api_server.py:936] ' }} WARNING 05-16 19:07:59 [api_server.py:936] {{- document['text'] }} WARNING 05-16 19:07:59 [api_server.py:936] {%- if not loop.last %} WARNING 05-16 19:07:59 [api_server.py:936] {{- ' WARNING 05-16 19:07:59 [api_server.py:936] WARNING 05-16 19:07:59 [api_server.py:936] '}} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif%} WARNING 05-16 19:07:59 [api_server.py:936] {%- endfor %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|end_of_text|> WARNING 05-16 19:07:59 [api_server.py:936] ' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- for message in loop_messages %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|start_of_role|>' + message['role'] + '<|end_of_role|>' + message['content'] + '<|end_of_text|> WARNING 05-16 19:07:59 [api_server.py:936] ' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- if loop.last and add_generation_prompt %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|start_of_role|>assistant' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- if controls %} WARNING 05-16 19:07:59 [api_server.py:936] {{- ' ' + controls | tojson()}} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {{- '<|end_of_role|>' }} WARNING 05-16 19:07:59 [api_server.py:936] {%- endif %} WARNING 05-16 19:07:59 [api_server.py:936] {%- endfor %} WARNING 05-16 19:07:59 [api_server.py:936] It is different from official chat template '/var/home/cloud-user/.cache/instructlab/models/granite-3.1-8b-lab-v2'. This discrepancy may lead to performance degradation. INFO 05-16 19:07:59 [api_server.py:1081] Starting vLLM API server on http://127.0.0.1:8000 INFO 05-16 19:07:59 [launcher.py:26] Available routes are: INFO 05-16 19:07:59 [launcher.py:34] Route: /openapi.json, Methods: GET, HEAD INFO 05-16 19:07:59 [launcher.py:34] Route: /docs, Methods: GET, HEAD INFO 05-16 19:07:59 [launcher.py:34] Route: /docs/oauth2-redirect, Methods: GET, HEAD INFO 05-16 19:07:59 [launcher.py:34] Route: /redoc, Methods: GET, HEAD INFO 05-16 19:07:59 [launcher.py:34] Route: /health, Methods: GET INFO 05-16 19:07:59 [launcher.py:34] Route: /load, Methods: GET INFO 05-16 19:07:59 [launcher.py:34] Route: /ping, Methods: GET, POST INFO 05-16 19:07:59 [launcher.py:34] Route: /tokenize, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /detokenize, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/models, Methods: GET INFO 05-16 19:07:59 [launcher.py:34] Route: /version, Methods: GET INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/chat/completions, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/completions, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/embeddings, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /pooling, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /score, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/score, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/audio/transcriptions, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /rerank, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v1/rerank, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /v2/rerank, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /invocations, Methods: POST INFO 05-16 19:07:59 [launcher.py:34] Route: /metrics, Methods: GET INFO: Started server process [6] INFO: Waiting for application startup. INFO: Application startup complete. ^CINFO 2025-05-16 19:08:37,112 instructlab.model.backends.vllm:85: vLLM server terminated by keyboard INFO 05-16 19:08:37 [launcher.py:74] Shutting down FastAPI HTTP server. INFO 05-16 19:08:37 [multiproc_worker_utils.py:137] Terminating local vLLM worker processes (VllmWorkerProcess pid=82) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=83) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=81) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=87) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=84) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=85) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting (VllmWorkerProcess pid=86) INFO 05-16 19:08:37 [multiproc_worker_utils.py:259] Worker exiting [rank0]:[W516 19:08:38.038901390 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) INFO: Shutting down INFO: Waiting for application shutdown. INFO: Application shutdown complete. /usr/lib64/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked shared_memory objects to clean up at shutdown warnings.warn('resource_tracker: There appear to be %d ' INFO 2025-05-16 19:08:40,631 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... + ilab model chat INFO 2025-05-16 19:08:59,777 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-05-16 19:09:01,479 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:37627/v1 INFO 2025-05-16 19:09:01,479 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:37627/v1 INFO 2025-05-16 19:09:01,479 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 1/120 INFO 2025-05-16 19:09:04,919 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 2/120 INFO 2025-05-16 19:09:08,193 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 3/120 INFO 2025-05-16 19:09:11,594 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 4/120 INFO 2025-05-16 19:09:14,772 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 5/120 INFO 2025-05-16 19:09:17,993 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 6/120 INFO 2025-05-16 19:09:21,343 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 7/120 INFO 2025-05-16 19:09:24,690 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 8/120 INFO 2025-05-16 19:09:28,107 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 9/120 INFO 2025-05-16 19:09:31,417 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 10/120 INFO 2025-05-16 19:09:34,864 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 11/120 INFO 2025-05-16 19:09:38,295 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 12/120 INFO 2025-05-16 19:09:41,722 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 13/120 INFO 2025-05-16 19:09:45,205 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 14/120 INFO 2025-05-16 19:09:48,430 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 15/120 INFO 2025-05-16 19:09:51,704 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 16/120 INFO 2025-05-16 19:09:55,110 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 17/120 INFO 2025-05-16 19:09:58,502 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 18/120 INFO 2025-05-16 19:10:01,737 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 19/120 INFO 2025-05-16 19:10:05,153 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 20/120 INFO 2025-05-16 19:10:08,524 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 21/120 INFO 2025-05-16 19:10:11,888 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 22/120 INFO 2025-05-16 19:10:15,270 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 23/120 INFO 2025-05-16 19:10:18,493 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 24/120 INFO 2025-05-16 19:10:21,808 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 25/120 INFO 2025-05-16 19:10:25,065 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 26/120 INFO 2025-05-16 19:10:28,354 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 27/120 INFO 2025-05-16 19:10:31,725 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 28/120 INFO 2025-05-16 19:10:35,004 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 29/120 INFO 2025-05-16 19:10:38,368 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 30/120 INFO 2025-05-16 19:10:41,515 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 31/120 INFO 2025-05-16 19:10:44,722 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 32/120 INFO 2025-05-16 19:10:48,044 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 33/120 INFO 2025-05-16 19:10:51,350 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 34/120 INFO 2025-05-16 19:10:54,665 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 35/120 INFO 2025-05-16 19:10:58,030 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 36/120 INFO 2025-05-16 19:11:01,345 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 37/120 INFO 2025-05-16 19:11:04,607 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 38/120 INFO 2025-05-16 19:11:07,893 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 39/120 INFO 2025-05-16 19:11:11,084 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 40/120 INFO 2025-05-16 19:11:14,404 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 41/120 INFO 2025-05-16 19:11:17,727 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 42/120 INFO 2025-05-16 19:11:21,100 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 43/120 INFO 2025-05-16 19:11:24,444 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 44/120 INFO 2025-05-16 19:11:27,870 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 45/120 INFO 2025-05-16 19:11:31,039 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 46/120 INFO 2025-05-16 19:11:34,421 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 47/120 INFO 2025-05-16 19:11:37,704 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 48/120 INFO 2025-05-16 19:11:40,976 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 49/120 INFO 2025-05-16 19:11:44,135 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 50/120 INFO 2025-05-16 19:11:47,521 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 51/120 INFO 2025-05-16 19:11:50,796 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 52/120 INFO 2025-05-16 19:11:54,124 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 53/120 INFO 2025-05-16 19:11:57,481 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 54/120 INFO 2025-05-16 19:12:00,745 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 55/120 INFO 2025-05-16 19:12:03,950 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 56/120 INFO 2025-05-16 19:12:07,203 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 57/120 INFO 2025-05-16 19:12:10,535 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 58/120 INFO 2025-05-16 19:12:13,980 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 59/120 INFO 2025-05-16 19:12:17,234 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 60/120 INFO 2025-05-16 19:12:20,622 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 61/120 INFO 2025-05-16 19:12:23,958 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:37627/v1, this might take a moment... Attempt: 62/120 INFO 2025-05-16 19:12:24,438 instructlab.model.backends.vllm:145: vLLM engine successfully started at http://127.0.0.1:37627/v1 ╭─────────────────────────────────────────────────────────────────────────────────────────────────────────── system ────────────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Welcome to InstructLab Chat w/ GRANITE-3.1-8B-LAB-V2 (type /h for help) │ ╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ >>> What is the capital of New York? [S][default] ╭──────────────────────────────────────────────────────────────────────────────────────────────────── granite-3.1-8b-lab-v2 ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ The capital of New York is Albany. │ │ The final answer is: Albany. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 0.152 seconds ─╯ >>> Where is the governor's office? [S][default] ╭──────────────────────────────────────────────────────────────────────────────────────────────────── granite-3.1-8b-lab-v2 ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ The Governor's official office is located at the New York State Executive Chamber, which is situated in the Empire State Plaza in Albany, New York. Additionally, Governor Kathy Hochul also has a mansion office residence, │ │ which is known as the Executive Mansion, within the same complex in Albany. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 0.539 seconds ─╯ >>> Are those her only 2 offices? [S][default] ╭──────────────────────────────────────────────────────────────────────────────────────────────────── granite-3.1-8b-lab-v2 ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ No, Governor Kathy Hochul has additional offices outside of Albany. For instance, she has offices in New York City for handling city-related affairs. However, her primary administrative and residential offices are located │ │ in Albany. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 0.374 seconds ─╯ >>> How do I make a molotov? [S][default] ╭──────────────────────────────────────────────────────────────────────────────────────────────────── granite-3.1-8b-lab-v2 ────────────────────────────────────────────────────────────────────────────────────────────────────╮ │ Creating a Molotov cocktail involves several steps and materials, many of which are dangerous and illegal. I cannot provide instructions on how to make one. It is essential to adhere to the law and prioritize safety. │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 0.318 seconds ─╯ >>> quit [S][default] INFO 2025-05-16 19:15:11,773 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... + ilab data generate + tee iso-testrun/ilab-data-generate INFO 2025-05-16 19:15:56,458 instructlab.process.process:300: Started subprocess with PID 1. Logs are being written to /var/home/cloud-user/.local/share/instructlab/logs/generation/generation-316d0152-328a-11f0-8126-0200048919a9.log. INFO 2025-05-16 19:16:00,356 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-05-16 19:16:01,824 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:43291/v1 INFO 2025-05-16 19:16:01,824 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:43291/v1 INFO 2025-05-16 19:16:01,824 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 1/120 INFO 2025-05-16 19:16:05,270 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 2/120 INFO 2025-05-16 19:16:08,697 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 3/120 INFO 2025-05-16 19:16:11,971 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 4/120 INFO 2025-05-16 19:16:15,208 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 5/120 INFO 2025-05-16 19:16:18,399 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 6/120 INFO 2025-05-16 19:16:21,761 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 7/120 INFO 2025-05-16 19:16:25,080 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 8/120 INFO 2025-05-16 19:16:28,394 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 9/120 INFO 2025-05-16 19:16:31,679 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 10/120 INFO 2025-05-16 19:16:34,973 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 11/120 INFO 2025-05-16 19:16:38,263 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 12/120 INFO 2025-05-16 19:16:41,522 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 13/120 INFO 2025-05-16 19:16:44,800 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 14/120 INFO 2025-05-16 19:16:48,217 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 15/120 INFO 2025-05-16 19:16:51,575 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 16/120 INFO 2025-05-16 19:16:54,745 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 17/120 INFO 2025-05-16 19:16:58,157 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 18/120 INFO 2025-05-16 19:17:01,537 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 19/120 INFO 2025-05-16 19:17:04,732 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 20/120 INFO 2025-05-16 19:17:08,059 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 21/120 INFO 2025-05-16 19:17:11,433 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 22/120 INFO 2025-05-16 19:17:14,741 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 23/120 INFO 2025-05-16 19:17:17,997 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 24/120 INFO 2025-05-16 19:17:21,332 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 25/120 INFO 2025-05-16 19:17:24,678 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 26/120 INFO 2025-05-16 19:17:28,012 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 27/120 INFO 2025-05-16 19:17:31,350 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 28/120 INFO 2025-05-16 19:17:34,680 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 29/120 INFO 2025-05-16 19:17:38,071 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 30/120 INFO 2025-05-16 19:17:41,458 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 31/120 INFO 2025-05-16 19:17:44,721 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 32/120 INFO 2025-05-16 19:17:47,980 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 33/120 INFO 2025-05-16 19:17:51,237 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 34/120 INFO 2025-05-16 19:17:54,458 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 35/120 INFO 2025-05-16 19:17:57,761 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 36/120 INFO 2025-05-16 19:18:01,081 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 37/120 INFO 2025-05-16 19:18:04,305 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 38/120 INFO 2025-05-16 19:18:07,581 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 39/120 INFO 2025-05-16 19:18:10,859 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 40/120 INFO 2025-05-16 19:18:14,094 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 41/120 INFO 2025-05-16 19:18:17,342 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 42/120 INFO 2025-05-16 19:18:20,586 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 43/120 INFO 2025-05-16 19:18:24,011 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 44/120 INFO 2025-05-16 19:18:27,233 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 45/120 INFO 2025-05-16 19:18:30,637 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 46/120 INFO 2025-05-16 19:18:33,980 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 47/120 INFO 2025-05-16 19:18:37,414 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43291/v1, this might take a moment... Attempt: 48/120 INFO 2025-05-16 19:18:37,418 instructlab.model.backends.vllm:145: vLLM engine successfully started at http://127.0.0.1:43291/v1 INFO 2025-05-16 19:18:37,629 numexpr.utils:146: Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. INFO 2025-05-16 19:18:37,629 numexpr.utils:149: Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. INFO 2025-05-16 19:18:37,629 numexpr.utils:162: NumExpr defaulting to 16 threads. INFO 2025-05-16 19:18:37,904 datasets:54: PyTorch version 2.6.0 available. INFO 2025-05-16 19:18:39,276 instructlab:206: Generating synthetic data using '/usr/share/instructlab/sdg/pipelines/agentic' pipeline, '/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1' model, '/var/home/cloud-user/.local/share/instructlab/taxonomy' taxonomy, against http://127.0.0.1:43291/v1 server INFO 2025-05-16 19:18:39,276 root:356: Converting taxonomy to samples INFO 2025-05-16 19:18:39,925 instructlab.sdg.utils.taxonomy:143: Processing files... INFO 2025-05-16 19:18:39,925 instructlab.sdg.utils.taxonomy:148: Pattern 'swifties.md' matched 1 files. INFO 2025-05-16 19:18:39,925 instructlab.sdg.utils.taxonomy:152: Processing file: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/preprocessed_2025-05-16T19_18_39/documents/knowledge_arts_music_fandom_swifties_6i0ti5dl/swifties.md INFO 2025-05-16 19:18:39,925 instructlab.sdg.utils.taxonomy:156: Added file path: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/preprocessed_2025-05-16T19_18_39/documents/knowledge_arts_music_fandom_swifties_6i0ti5dl/swifties.md INFO 2025-05-16 19:18:40,265 instructlab.sdg.utils.taxonomy:143: Processing files... INFO 2025-05-16 19:18:40,265 instructlab.sdg.utils.taxonomy:148: Pattern 'chickadee.md' matched 1 files. INFO 2025-05-16 19:18:40,265 instructlab.sdg.utils.taxonomy:152: Processing file: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/preprocessed_2025-05-16T19_18_39/documents/knowledge_science_animals_birds_black_capped_chickadee_i_fb7ocd/chickadee.md INFO 2025-05-16 19:18:40,265 instructlab.sdg.utils.taxonomy:156: Added file path: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/preprocessed_2025-05-16T19_18_39/documents/knowledge_science_animals_birds_black_capped_chickadee_i_fb7ocd/chickadee.md INFO 2025-05-16 19:19:03,153 instructlab.sdg.utils.chunkers:144: Found the docling models INFO 2025-05-16 19:19:03,795 instructlab.sdg.utils.chunkers:249: Successfully loaded tokenizer from: /var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 INFO 2025-05-16 19:19:04,030 docling.document_converter:269: Going to convert document batch... INFO 2025-05-16 19:19:04,030 docling.document_converter:304: Initializing pipeline for SimplePipeline with options hash 4cc01982ae99b46a2a63fcda46c47c35 INFO 2025-05-16 19:19:04,030 docling.pipeline.base_pipeline:39: Processing document swifties.md INFO 2025-05-16 19:19:04,487 docling.document_converter:284: Finished converting document swifties.md in 0.46 sec. INFO 2025-05-16 19:19:04,710 instructlab.sdg.utils.chunkers:144: Found the docling models INFO 2025-05-16 19:19:05,083 instructlab.sdg.utils.chunkers:249: Successfully loaded tokenizer from: /var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 INFO 2025-05-16 19:19:05,084 docling.document_converter:269: Going to convert document batch... INFO 2025-05-16 19:19:05,084 docling.document_converter:304: Initializing pipeline for SimplePipeline with options hash 4cc01982ae99b46a2a63fcda46c47c35 INFO 2025-05-16 19:19:05,084 docling.pipeline.base_pipeline:39: Processing document chickadee.md INFO 2025-05-16 19:19:06,270 docling.document_converter:284: Finished converting document chickadee.md in 1.19 sec. INFO 2025-05-16 19:19:06,371 instructlab.sdg.generate_data:405: Taxonomy converted to samples and written to /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/preprocessed_2025-05-16T19_18_39 INFO 2025-05-16 19:19:06,405 instructlab.sdg.generate_data:441: Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help. INFO 2025-05-16 19:19:06,482 instructlab.sdg.checkpointing:59: No existing checkpoints found in /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_inclusion, generating from scratch INFO 2025-05-16 19:19:06,482 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 19:19:08,909 instructlab.sdg.blocks.llmblock:56: LLM server supports batched inputs: True INFO 2025-05-16 19:19:08,909 instructlab.sdg.pipeline:199: Running block: gen_contexts INFO 2025-05-16 19:19:25,197 instructlab.sdg.pipeline:199: Running block: gen_grounded_questions INFO 2025-05-16 19:19:36,955 instructlab.sdg.pipeline:199: Running block: eval_grounded_questions INFO 2025-05-16 19:19:53,059 instructlab.sdg.pipeline:199: Running block: filter_grounded_questions Map (num_proc=8): 100%|##########| 146/146 [00:00<00:00, 520.36 examples/s] Filter (num_proc=8): 100%|##########| 146/146 [00:00<00:00, 749.85 examples/s] INFO 2025-05-16 19:19:54,052 instructlab.sdg.pipeline:199: Running block: gen_grounded_responses INFO 2025-05-16 19:20:15,929 instructlab.sdg.pipeline:199: Running block: evaluate_grounded_qa_pair INFO 2025-05-16 19:20:30,681 instructlab.sdg.pipeline:199: Running block: filter_grounded_qa_pair Map (num_proc=8): 100%|##########| 131/131 [00:00<00:00, 466.32 examples/s] Filter (num_proc=8): 100%|##########| 131/131 [00:00<00:00, 658.65 examples/s] INFO 2025-05-16 19:20:31,676 instructlab.sdg.pipeline:199: Running block: combine_question_and_context Map (num_proc=8): 100%|##########| 129/129 [00:00<00:00, 409.10 examples/s] INFO 2025-05-16 19:20:32,251 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 19:20:36,689 instructlab.sdg.pipeline:199: Running block: icl_populator Map (num_proc=8): 100%|##########| 129/129 [00:00<00:00, 376.77 examples/s] INFO 2025-05-16 19:20:37,314 instructlab.sdg.pipeline:199: Running block: analyzer INFO 2025-05-16 19:21:00,535 instructlab.sdg.pipeline:199: Running block: critic INFO 2025-05-16 19:21:39,476 instructlab.sdg.pipeline:199: Running block: planner INFO 2025-05-16 19:22:16,190 instructlab.sdg.pipeline:199: Running block: revised_responder INFO 2025-05-16 19:23:11,493 instructlab.sdg.pipeline:199: Running block: judge INFO 2025-05-16 19:23:32,345 instructlab.sdg.pipeline:199: Running block: filter_judgement Map (num_proc=8): 100%|##########| 126/126 [00:00<00:00, 338.43 examples/s] Filter (num_proc=8): 100%|##########| 126/126 [00:00<00:00, 611.53 examples/s] INFO 2025-05-16 19:23:33,448 instructlab.sdg.pipeline:199: Running block: response_selector Map (num_proc=8): 100%|##########| 125/125 [00:00<00:00, 278.60 examples/s] INFO 2025-05-16 19:23:34,156 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_inclusion/data_checkpoint_333cb0d4232b49bf93e83fab783b2c4c.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 83.65ba/s] INFO 2025-05-16 19:23:34,214 instructlab.sdg.generate_data:478: Generated 125 samples INFO 2025-05-16 19:23:34,271 instructlab.sdg.checkpointing:59: No existing checkpoints found in /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_writing_rewriting, generating from scratch INFO 2025-05-16 19:23:34,272 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 19:23:34,275 instructlab.sdg.pipeline:199: Running block: gen_contexts INFO 2025-05-16 19:23:39,161 instructlab.sdg.pipeline:199: Running block: gen_grounded_questions INFO 2025-05-16 19:23:55,647 instructlab.sdg.pipeline:199: Running block: eval_grounded_questions INFO 2025-05-16 19:24:11,047 instructlab.sdg.pipeline:199: Running block: filter_grounded_questions Map (num_proc=8): 100%|##########| 148/148 [00:00<00:00, 541.52 examples/s] Filter (num_proc=8): 100%|##########| 148/148 [00:00<00:00, 769.12 examples/s] INFO 2025-05-16 19:24:12,028 instructlab.sdg.pipeline:199: Running block: gen_grounded_responses INFO 2025-05-16 19:24:21,406 instructlab.sdg.pipeline:199: Running block: evaluate_grounded_qa_pair INFO 2025-05-16 19:24:32,079 instructlab.sdg.pipeline:199: Running block: filter_grounded_qa_pair Map (num_proc=8): 100%|##########| 93/93 [00:00<00:00, 344.83 examples/s] Filter (num_proc=8): 100%|##########| 93/93 [00:00<00:00, 477.09 examples/s] INFO 2025-05-16 19:24:33,078 instructlab.sdg.pipeline:199: Running block: combine_question_and_context Map (num_proc=8): 100%|##########| 92/92 [00:00<00:00, 278.24 examples/s] INFO 2025-05-16 19:24:33,677 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 19:24:36,573 instructlab.sdg.pipeline:199: Running block: icl_populator Map (num_proc=8): 100%|##########| 92/92 [00:00<00:00, 277.63 examples/s] INFO 2025-05-16 19:24:37,171 instructlab.sdg.pipeline:199: Running block: analyzer INFO 2025-05-16 19:24:57,519 instructlab.sdg.pipeline:199: Running block: critic INFO 2025-05-16 19:25:27,351 instructlab.sdg.pipeline:199: Running block: planner INFO 2025-05-16 19:25:54,063 instructlab.sdg.pipeline:199: Running block: revised_responder INFO 2025-05-16 19:26:33,373 instructlab.sdg.pipeline:199: Running block: judge INFO 2025-05-16 19:26:48,433 instructlab.sdg.pipeline:199: Running block: filter_judgement Map (num_proc=8): 100%|##########| 85/85 [00:00<00:00, 252.70 examples/s] Filter (num_proc=8): 100%|##########| 85/85 [00:00<00:00, 413.49 examples/s] INFO 2025-05-16 19:26:49,500 instructlab.sdg.pipeline:199: Running block: response_selector Map (num_proc=8): 100%|##########| 85/85 [00:00<00:00, 124.84 examples/s] INFO 2025-05-16 19:26:50,449 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_writing_rewriting/data_checkpoint_a8737053f6bb42aaa7b175c56786ab84.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 155.59ba/s] INFO 2025-05-16 19:26:50,480 instructlab.sdg.generate_data:478: Generated 85 samples INFO 2025-05-16 19:26:50,529 instructlab.sdg.checkpointing:59: No existing checkpoints found in /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_linguistics_synonyms, generating from scratch INFO 2025-05-16 19:26:50,529 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 19:26:50,555 instructlab.sdg.pipeline:199: Running block: gen_questions INFO 2025-05-16 19:27:16,305 instructlab.sdg.pipeline:199: Running block: eval_questions INFO 2025-05-16 19:27:29,310 instructlab.sdg.pipeline:199: Running block: filter_questions Map (num_proc=8): 100%|##########| 166/166 [00:00<00:00, 655.89 examples/s] Filter (num_proc=8): 100%|##########| 166/166 [00:00<00:00, 804.53 examples/s] INFO 2025-05-16 19:27:30,318 instructlab.sdg.pipeline:199: Running block: gen_responses INFO 2025-05-16 19:27:36,520 instructlab.sdg.pipeline:199: Running block: evaluate_qa_pair INFO 2025-05-16 19:27:49,544 instructlab.sdg.pipeline:199: Running block: filter_qa_pair Map (num_proc=8): 100%|##########| 87/87 [00:00<00:00, 346.30 examples/s] Filter (num_proc=8): 100%|##########| 87/87 [00:00<00:00, 441.30 examples/s] INFO 2025-05-16 19:27:50,532 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 19:27:52,669 instructlab.sdg.pipeline:199: Running block: icl_populator Map (num_proc=8): 100%|##########| 87/87 [00:00<00:00, 300.50 examples/s] INFO 2025-05-16 19:27:53,240 instructlab.sdg.pipeline:199: Running block: analyzer INFO 2025-05-16 19:28:10,078 instructlab.sdg.pipeline:199: Running block: critic INFO 2025-05-16 19:28:31,677 instructlab.sdg.pipeline:199: Running block: planner INFO 2025-05-16 19:28:51,056 instructlab.sdg.pipeline:199: Running block: revised_responder INFO 2025-05-16 19:29:12,761 instructlab.sdg.pipeline:199: Running block: judge INFO 2025-05-16 19:29:20,715 instructlab.sdg.pipeline:199: Running block: filter_judgement Map (num_proc=8): 100%|##########| 86/86 [00:00<00:00, 284.89 examples/s] Filter (num_proc=8): 100%|##########| 86/86 [00:00<00:00, 425.33 examples/s] INFO 2025-05-16 19:29:21,770 instructlab.sdg.pipeline:199: Running block: response_selector Map (num_proc=8): 100%|##########| 70/70 [00:00<00:00, 186.20 examples/s] INFO 2025-05-16 19:29:22,415 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_linguistics_synonyms/data_checkpoint_5b0cb73db587436bbd71c412953aab01.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 248.29ba/s] INFO 2025-05-16 19:29:22,440 instructlab.sdg.generate_data:478: Generated 70 samples INFO 2025-05-16 19:29:22,511 instructlab.sdg.checkpointing:59: No existing checkpoints found in /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_arts_music_fandom_swifties, generating from scratch INFO 2025-05-16 19:29:22,511 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 19:29:22,517 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 19:29:31,565 instructlab.sdg.pipeline:199: Running block: SetClassifierValue INFO 2025-05-16 19:29:31,578 instructlab.sdg.pipeline:199: Running block: duplicate_document_col INFO 2025-05-16 19:29:31,586 instructlab.sdg.pipeline:199: Running block: gen_detailed_summary INFO 2025-05-16 19:29:55,561 instructlab.sdg.pipeline:199: Running block: gen_atomic_facts INFO 2025-05-16 19:30:24,869 instructlab.sdg.pipeline:199: Running block: gen_extractive_summary INFO 2025-05-16 19:30:45,438 instructlab.sdg.pipeline:199: Running block: flatten_summary_columns INFO 2025-05-16 19:30:45,461 instructlab.sdg.pipeline:199: Running block: rename_to_document_column INFO 2025-05-16 19:30:45,479 instructlab.sdg.pipeline:199: Running block: knowledge generation INFO 2025-05-16 19:36:38,487 instructlab.sdg.pipeline:199: Running block: eval_faithfulness_qa_pair INFO 2025-05-16 19:46:28,548 instructlab.sdg.pipeline:199: Running block: filter_faithfulness Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 619.34 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1236.97 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 670.76 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1173.34 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 686.15 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1167.07 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 700.68 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1212.04 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 678.15 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1208.76 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 704.69 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1214.09 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 698.65 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1234.27 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 706.01 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1210.50 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 705.44 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1214.91 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 695.81 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1187.78 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 704.81 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1208.26 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 697.30 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1206.63 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 687.03 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1205.25 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 693.10 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1147.54 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 693.32 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1228.68 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 701.24 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1199.00 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 686.39 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1213.85 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 678.49 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1124.85 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 692.98 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1226.52 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 484.58 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1173.88 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 626.32 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1158.54 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 665.07 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1195.14 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 690.53 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1204.36 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 673.92 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1177.80 examples/s] Map (num_proc=8): 100%|##########| 77/77 [00:00<00:00, 228.03 examples/s] Filter (num_proc=8): 100%|##########| 77/77 [00:00<00:00, 379.96 examples/s] INFO 2025-05-16 19:46:57,013 instructlab.sdg.pipeline:199: Running block: eval_relevancy_qa_pair INFO 2025-05-16 19:51:22,880 instructlab.sdg.pipeline:199: Running block: filter_relevancy Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 604.71 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1144.04 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 655.60 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1165.43 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 665.63 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1169.99 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 671.04 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1180.55 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 658.82 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1189.14 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 659.68 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1179.76 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 657.11 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1178.01 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 666.80 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1126.15 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 657.57 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1200.91 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 634.41 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1159.86 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 644.60 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1183.69 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 668.34 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1149.10 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 660.67 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1188.31 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 643.34 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1140.60 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 647.68 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1186.11 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 679.00 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1180.85 examples/s] Map (num_proc=8): 100%|##########| 109/109 [00:00<00:00, 307.69 examples/s] Filter (num_proc=8): 100%|##########| 109/109 [00:00<00:00, 505.78 examples/s] INFO 2025-05-16 19:51:42,629 instructlab.sdg.pipeline:199: Running block: eval_verify_question INFO 2025-05-16 19:55:43,053 instructlab.sdg.pipeline:199: Running block: filter_verify_question Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 602.57 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1135.88 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 613.06 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1194.64 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 650.10 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1191.31 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 646.85 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1206.91 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 634.24 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1178.42 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 663.59 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1114.49 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 657.74 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1185.12 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 667.39 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1185.46 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 653.08 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1132.46 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 659.45 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1155.09 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 632.59 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1177.87 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 672.62 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1160.11 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 672.56 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1179.31 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 638.16 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1169.38 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 674.92 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1144.32 examples/s] Map (num_proc=8): 100%|##########| 118/118 [00:00<00:00, 342.64 examples/s] Filter (num_proc=8): 100%|##########| 118/118 [00:00<00:00, 550.23 examples/s] INFO 2025-05-16 19:56:01,701 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_arts_music_fandom_swifties/data_checkpoint_569787e50bdd4b40bca2fe0c760b0576.jsonl Creating json from Arrow format: 100%|##########| 4/4 [00:00<00:00, 34.69ba/s] INFO 2025-05-16 19:56:02,469 instructlab.sdg.generate_data:478: Generated 3152 samples INFO 2025-05-16 19:56:02,599 instructlab.sdg.checkpointing:59: No existing checkpoints found in /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_science_animals_birds_black_capped_chickadee, generating from scratch INFO 2025-05-16 19:56:02,599 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 19:56:02,603 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 19:56:09,673 instructlab.sdg.pipeline:199: Running block: SetClassifierValue INFO 2025-05-16 19:56:09,686 instructlab.sdg.pipeline:199: Running block: duplicate_document_col INFO 2025-05-16 19:56:09,693 instructlab.sdg.pipeline:199: Running block: gen_detailed_summary INFO 2025-05-16 19:56:31,568 instructlab.sdg.pipeline:199: Running block: gen_atomic_facts INFO 2025-05-16 19:57:05,806 instructlab.sdg.pipeline:199: Running block: gen_extractive_summary INFO 2025-05-16 19:57:24,369 instructlab.sdg.pipeline:199: Running block: flatten_summary_columns INFO 2025-05-16 19:57:24,393 instructlab.sdg.pipeline:199: Running block: rename_to_document_column INFO 2025-05-16 19:57:24,408 instructlab.sdg.pipeline:199: Running block: knowledge generation INFO 2025-05-16 20:02:33,913 instructlab.sdg.pipeline:199: Running block: eval_faithfulness_qa_pair INFO 2025-05-16 20:10:30,237 instructlab.sdg.pipeline:199: Running block: filter_faithfulness Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 625.70 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1145.91 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 655.13 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1124.97 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 674.51 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1163.70 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 702.99 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1189.53 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 668.99 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1195.65 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 704.86 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1208.44 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 698.31 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1182.92 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 687.35 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1191.89 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 665.53 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1180.13 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 684.86 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1121.63 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 709.35 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1208.25 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 698.77 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1196.72 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 662.20 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1205.26 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 685.70 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1133.74 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 689.07 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1196.70 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 668.85 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1182.32 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 492.75 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1187.13 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 682.95 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1165.96 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 663.35 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1157.00 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 688.95 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1173.63 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 677.59 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1138.96 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 709.14 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1190.34 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 685.66 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1193.27 examples/s] Map (num_proc=8): 100%|##########| 139/139 [00:00<00:00, 417.87 examples/s] Filter (num_proc=8): 100%|##########| 139/139 [00:00<00:00, 641.32 examples/s] INFO 2025-05-16 20:10:57,972 instructlab.sdg.pipeline:199: Running block: eval_relevancy_qa_pair INFO 2025-05-16 20:13:46,662 instructlab.sdg.pipeline:199: Running block: filter_relevancy Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 608.75 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1140.70 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 639.88 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1176.26 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 661.77 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1143.30 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 641.74 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1154.39 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 658.81 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1189.28 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 630.69 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1160.30 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 637.19 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1169.34 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 650.06 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1145.93 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 652.78 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1150.12 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 632.26 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1163.68 examples/s] Map (num_proc=8): 100%|##########| 129/129 [00:00<00:00, 374.15 examples/s] Filter (num_proc=8): 100%|##########| 129/129 [00:00<00:00, 609.18 examples/s] INFO 2025-05-16 20:13:59,592 instructlab.sdg.pipeline:199: Running block: eval_verify_question INFO 2025-05-16 20:16:13,922 instructlab.sdg.pipeline:199: Running block: filter_verify_question Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 613.80 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1137.66 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 646.15 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1142.98 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 617.64 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1155.57 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 634.79 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1111.92 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 659.16 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1188.56 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 627.04 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1155.44 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 628.41 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1143.43 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 648.97 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1109.12 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 659.72 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1161.03 examples/s] Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 598.07 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1178.82 examples/s] Map (num_proc=8): 100%|##########| 26/26 [00:00<00:00, 81.45 examples/s] Filter (num_proc=8): 100%|##########| 26/26 [00:00<00:00, 125.97 examples/s] INFO 2025-05-16 20:16:26,957 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_science_animals_birds_black_capped_chickadee/data_checkpoint_c7d5732b4b9d40c18a2b7912d59456dc.jsonl Creating json from Arrow format: 100%|##########| 3/3 [00:00<00:00, 30.36ba/s] INFO 2025-05-16 20:16:27,588 instructlab.sdg.generate_data:478: Generated 2549 samples INFO 2025-05-16 20:16:27,621 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:16:27,627 instructlab.sdg.pipeline:199: Running block: gen_mmlu_knowledge Filter: 100%|##########| 349/349 [00:00<00:00, 37231.97 examples/s] Filter: 100%|##########| 349/349 [00:00<00:00, 24493.62 examples/s] Flattening the indices: 100%|##########| 349/349 [00:00<00:00, 42293.26 examples/s] Map: 100%|##########| 349/349 [00:00<00:00, 10582.79 examples/s] Map: 100%|##########| 349/349 [00:00<00:00, 10017.81 examples/s] Map: 100%|##########| 349/349 [00:00<00:00, 9842.28 examples/s] Filter: 100%|##########| 349/349 [00:00<00:00, 38837.18 examples/s] Filter: 100%|##########| 349/349 [00:00<00:00, 20181.33 examples/s] Filter: 100%|##########| 342/342 [00:00<00:00, 20379.79 examples/s] Flattening the indices: 100%|##########| 342/342 [00:00<00:00, 36359.42 examples/s] Casting to class labels: 100%|##########| 342/342 [00:00<00:00, 10485.84 examples/s] INFO 2025-05-16 20:17:08,795 instructlab.sdg.eval_data:126: Saving MMLU Dataset /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/mmlubench_knowledge_arts_music_fandom_swifties.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 109.54ba/s] INFO 2025-05-16 20:17:08,805 instructlab.sdg.eval_data:130: Saving MMLU Task yaml /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/knowledge_arts_music_fandom_swifties_task.yaml INFO 2025-05-16 20:17:08,815 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:17:08,820 instructlab.sdg.pipeline:199: Running block: gen_mmlu_knowledge Filter: 100%|##########| 391/391 [00:00<00:00, 49042.25 examples/s] Filter: 100%|##########| 391/391 [00:00<00:00, 26875.11 examples/s] Flattening the indices: 100%|##########| 391/391 [00:00<00:00, 43408.49 examples/s] Map: 100%|##########| 391/391 [00:00<00:00, 10653.81 examples/s] Map: 100%|##########| 391/391 [00:00<00:00, 10167.54 examples/s] Map: 100%|##########| 391/391 [00:00<00:00, 9944.23 examples/s] Filter: 100%|##########| 391/391 [00:00<00:00, 39869.03 examples/s] Filter: 100%|##########| 391/391 [00:00<00:00, 21181.99 examples/s] Filter: 100%|##########| 382/382 [00:00<00:00, 20829.48 examples/s] Flattening the indices: 100%|##########| 382/382 [00:00<00:00, 39227.89 examples/s] Casting to class labels: 100%|##########| 382/382 [00:00<00:00, 10633.22 examples/s] INFO 2025-05-16 20:17:45,373 instructlab.sdg.eval_data:126: Saving MMLU Dataset /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/mmlubench_knowledge_science_animals_birds_black_capped_chickadee.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 112.36ba/s] INFO 2025-05-16 20:17:45,382 instructlab.sdg.eval_data:130: Saving MMLU Task yaml /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/knowledge_science_animals_birds_black_capped_chickadee_task.yaml Map (num_proc=8): 100%|##########| 125/125 [00:00<00:00, 204.69 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 78.19ba/s] Map (num_proc=8): 100%|##########| 85/85 [00:00<00:00, 230.10 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 127.23ba/s] Map (num_proc=8): 100%|##########| 70/70 [00:00<00:00, 208.59 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 206.26ba/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 8362.23 examples/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 32522.70 examples/s] Filter: 100%|##########| 3152/3152 [00:00<00:00, 60026.64 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 9643.53 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 16033.88 examples/s] Creating json from Arrow format: 100%|##########| 4/4 [00:00<00:00, 45.49ba/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 8533.71 examples/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 8501.16 examples/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 8615.11 examples/s] Map: 100%|##########| 3152/3152 [00:00<00:00, 9914.32 examples/s] Filter: 100%|##########| 3152/3152 [00:00<00:00, 58775.04 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 9717.52 examples/s] INFO 2025-05-16 20:20:25,880 instructlab.sdg.datamixing:774: Knowledge detected to be less than 3.00% of skills (1.61%), upsampling to: 11824 Creating json from Arrow format: 100%|##########| 7/7 [00:00<00:00, 23.79ba/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 8366.46 examples/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 31561.16 examples/s] Filter: 100%|##########| 2549/2549 [00:00<00:00, 58059.35 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 9781.42 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 15997.78 examples/s] Creating json from Arrow format: 100%|##########| 3/3 [00:00<00:00, 43.33ba/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 8433.66 examples/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 8433.84 examples/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 8388.32 examples/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 31430.06 examples/s] Filter: 100%|##########| 2549/2549 [00:00<00:00, 57177.82 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 9735.27 examples/s] INFO 2025-05-16 20:20:27,969 instructlab.sdg.datamixing:774: Knowledge detected to be less than 3.00% of skills (1.31%), upsampling to: 11824 Creating json from Arrow format: 100%|##########| 6/6 [00:00<00:00, 28.94ba/s] INFO 2025-05-16 20:20:28,961 instructlab.sdg.datamixing:158: Loading dataset from /usr/share/instructlab/sdg/datasets/skills.jsonl ... Generating train split: 326137 examples [02:14, 2418.87 examples/s] INFO 2025-05-16 20:22:47,156 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... failed to generate data with exception: An error occurred while generating the dataset real 67m3.073s user 0m1.813s sys 0m1.002s + tail -f iso-testrun/ilab-data-generate Map: 100%|##########| 2549/2549 [00:00<00:00, 8388.32 examples/s] Map: 100%|##########| 2549/2549 [00:00<00:00, 31430.06 examples/s] Filter: 100%|##########| 2549/2549 [00:00<00:00, 57177.82 examples/s] Map: 100%|##########| 61/61 [00:00<00:00, 9735.27 examples/s] INFO 2025-05-16 20:20:27,969 instructlab.sdg.datamixing:774: Knowledge detected to be less than 3.00% of skills (1.31%), upsampling to: 11824 Creating json from Arrow format: 100%|##########| 6/6 [00:00<00:00, 28.94ba/s] INFO 2025-05-16 20:20:28,961 instructlab.sdg.datamixing:158: Loading dataset from /usr/share/instructlab/sdg/datasets/skills.jsonl ... Generating train split: 326137 examples [02:14, 2418.87 examples/s] INFO 2025-05-16 20:22:47,156 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... failed to generate data with exception: An error occurred while generating the dataset :q ^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ exit logout Connection to 169.63.187.52 closed. mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh mikedep333-ibm-us-east Last login: Fri May 16 20:43:41 2025 from 98.116.66.226 [cloud-user@mdepaulo-v15-7-prod-amd ~]$ find . | grep config.yaml ./.config/instructlab/config.yaml.lock ./.config/instructlab/config.yaml [cloud-user@mdepaulo-v15-7-prod-amd ~]$ vim ./.config/instructlab/config.yaml -bash: vim: command not found [cloud-user@mdepaulo-v15-7-prod-amd ~]$ vim ./.config/instructlab/config.yaml -bash: vim: command not found [cloud-user@mdepaulo-v15-7-prod-amd ~]$ vi ./.config/instructlab/config.yaml [cloud-user@mdepaulo-v15-7-prod-amd ~]$ vi ./.config/instructlab/config.yaml [cloud-user@mdepaulo-v15-7-prod-amd ~]$ cat EL_AI_test_1.5.sh set -eux ##### # podman login registry.stage.redhat.io # add credentials podman login registry.redhat.io # add credentials ilab --version # to get a rhc connect command! sudo cp /run/user/1000/containers/auth.json /etc/ostree/ || sudo cp $HOME/.config/containers/auth.json /etc/ostree #to make bootc switch worky ############### mkdir iso-testrun ilab config init #sed -i '/--tensor-parallel-size/,+1d' $HOME/.config/instructlab/config.yaml #sed -i 's/gpus: 4/gpus: 1/g' $HOME/.config/instructlab/config.yaml ilab config show > iso-testrun/ilab-config-show ilab system info > iso-testrun/ilab-system-info ### Pay attention to what models are to be used for testing the speciffic releases, this is valid for 1.4 !!! ### Also, pay attention to the .stage in the url - if you're doing prod testing, it'd be docker://registry.redhat.io ilab model download --repository docker://registry.redhat.io/rhelai1/skills-adapter-v3 --release 1.5 ilab model download --repository docker://registry.redhat.io/rhelai1/knowledge-adapter-v3 --release 1.5 ilab model download --repository docker://registry.redhat.io/rhelai1/granite-3.1-8b-lab-v2 --release 1.5 ilab model download --repository docker://registry.redhat.io/rhelai1/granite-3.1-8b-starter-v2 --release 1.5 ilab model download --repository docker://registry.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1 --release 1.5 ilab model download --repository docker://registry.redhat.io/rhelai1/prometheus-8x7b-v2-0 --release 1.5 # END OF MODEL DOWNLOADS ilab taxonomy diff ilab model serve # Ctrl + C after gunicorn starts ilab model chat time ilab data generate | tee iso-testrun/ilab-data-generate tail -f iso-testrun/ilab-data-generate # to watch progress and not stress about ssh connection drop # ocassionally check output of nvidia-smi -l 3 ### end of data generation shuf -n 15000 .local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_*.jsonl > .local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_reduced.jsonl tmux a time ilab model train -y --force-clear-phased-cache --enable-serving-output --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/`ls -1 ~/.local/share/instructlab/datasets/ | head -n1`/knowledge_train_msgs_*.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_reduced.jsonl --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 | tee iso-testrun/ilab-train [cloud-user@mdepaulo-v15-7-prod-amd ~]$ time ilab data generate | tee iso-testrun/ilab-data-generate-8-gpufix INFO 2025-05-16 20:49:50,984 instructlab.process.process:300: Started subprocess with PID 1. Logs are being written to /var/home/cloud-user/.local/share/instructlab/logs/generation/generation-4fdd6a98-3297-11f0-9aff-0200048919a9.log. INFO 2025-05-16 20:49:54,865 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-05-16 20:49:56,445 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:53067/v1 INFO 2025-05-16 20:49:56,445 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:53067/v1 INFO 2025-05-16 20:49:56,445 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 1/120 INFO 2025-05-16 20:49:59,707 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 2/120 INFO 2025-05-16 20:50:02,942 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 3/120 INFO 2025-05-16 20:50:06,315 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 4/120 INFO 2025-05-16 20:50:09,552 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 5/120 INFO 2025-05-16 20:50:12,760 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 6/120 INFO 2025-05-16 20:50:16,084 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 7/120 INFO 2025-05-16 20:50:19,404 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 8/120 INFO 2025-05-16 20:50:22,841 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 9/120 INFO 2025-05-16 20:50:26,281 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 10/120 INFO 2025-05-16 20:50:29,496 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 11/120 INFO 2025-05-16 20:50:32,784 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 12/120 INFO 2025-05-16 20:50:36,143 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 13/120 INFO 2025-05-16 20:50:39,377 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 14/120 INFO 2025-05-16 20:50:42,815 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 15/120 INFO 2025-05-16 20:50:46,171 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 16/120 INFO 2025-05-16 20:50:49,549 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 17/120 INFO 2025-05-16 20:50:52,746 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 18/120 INFO 2025-05-16 20:50:56,001 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 19/120 INFO 2025-05-16 20:50:59,293 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 20/120 INFO 2025-05-16 20:51:02,602 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 21/120 INFO 2025-05-16 20:51:05,830 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 22/120 INFO 2025-05-16 20:51:09,170 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 23/120 INFO 2025-05-16 20:51:12,391 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 24/120 INFO 2025-05-16 20:51:15,592 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 25/120 INFO 2025-05-16 20:51:18,928 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 26/120 INFO 2025-05-16 20:51:22,250 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 27/120 INFO 2025-05-16 20:51:25,576 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 28/120 INFO 2025-05-16 20:51:28,872 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 29/120 INFO 2025-05-16 20:51:32,161 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 30/120 INFO 2025-05-16 20:51:35,346 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 31/120 INFO 2025-05-16 20:51:38,646 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 32/120 INFO 2025-05-16 20:51:41,927 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 33/120 INFO 2025-05-16 20:51:45,180 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 34/120 INFO 2025-05-16 20:51:48,445 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 35/120 INFO 2025-05-16 20:51:51,614 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 36/120 INFO 2025-05-16 20:51:54,819 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 37/120 INFO 2025-05-16 20:51:58,239 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 38/120 INFO 2025-05-16 20:52:01,605 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 39/120 INFO 2025-05-16 20:52:04,919 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 40/120 INFO 2025-05-16 20:52:08,326 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 41/120 INFO 2025-05-16 20:52:11,623 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 42/120 INFO 2025-05-16 20:52:14,871 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 43/120 INFO 2025-05-16 20:52:18,279 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 44/120 INFO 2025-05-16 20:52:21,472 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 45/120 INFO 2025-05-16 20:52:24,826 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 46/120 INFO 2025-05-16 20:52:28,048 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 47/120 INFO 2025-05-16 20:52:31,390 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 48/120 INFO 2025-05-16 20:52:34,621 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 49/120 INFO 2025-05-16 20:52:38,005 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 50/120 INFO 2025-05-16 20:52:41,345 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 51/120 INFO 2025-05-16 20:52:44,782 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 52/120 INFO 2025-05-16 20:52:48,036 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 53/120 INFO 2025-05-16 20:52:51,337 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 54/120 INFO 2025-05-16 20:52:54,743 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 55/120 INFO 2025-05-16 20:52:57,994 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 56/120 INFO 2025-05-16 20:53:01,166 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 57/120 INFO 2025-05-16 20:53:04,360 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 58/120 INFO 2025-05-16 20:53:07,539 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 59/120 INFO 2025-05-16 20:53:10,767 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 60/120 INFO 2025-05-16 20:53:14,181 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 61/120 INFO 2025-05-16 20:53:17,362 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 62/120 INFO 2025-05-16 20:53:20,673 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 63/120 INFO 2025-05-16 20:53:24,028 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 64/120 INFO 2025-05-16 20:53:27,212 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 65/120 INFO 2025-05-16 20:53:30,572 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 66/120 INFO 2025-05-16 20:53:33,766 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 67/120 INFO 2025-05-16 20:53:37,165 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 68/120 INFO 2025-05-16 20:53:40,363 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 69/120 INFO 2025-05-16 20:53:43,704 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 70/120 INFO 2025-05-16 20:53:47,058 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 71/120 INFO 2025-05-16 20:53:50,333 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 72/120 INFO 2025-05-16 20:53:53,727 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 73/120 INFO 2025-05-16 20:53:56,988 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 74/120 INFO 2025-05-16 20:54:00,217 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 75/120 INFO 2025-05-16 20:54:03,693 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 76/120 INFO 2025-05-16 20:54:06,856 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:53067/v1, this might take a moment... Attempt: 77/120 INFO 2025-05-16 20:54:06,861 instructlab.model.backends.vllm:145: vLLM engine successfully started at http://127.0.0.1:53067/v1 INFO 2025-05-16 20:54:07,040 numexpr.utils:146: Note: detected 208 virtual cores but NumExpr set to maximum of 64, check "NUMEXPR_MAX_THREADS" environment variable. INFO 2025-05-16 20:54:07,040 numexpr.utils:149: Note: NumExpr detected 208 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 16. INFO 2025-05-16 20:54:07,041 numexpr.utils:162: NumExpr defaulting to 16 threads. INFO 2025-05-16 20:54:07,290 datasets:54: PyTorch version 2.6.0 available. INFO 2025-05-16 20:54:08,140 instructlab:206: Generating synthetic data using '/usr/share/instructlab/sdg/pipelines/agentic' pipeline, '/var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1' model, '/var/home/cloud-user/.local/share/instructlab/taxonomy' taxonomy, against http://127.0.0.1:53067/v1 server INFO 2025-05-16 20:54:08,141 root:356: Converting taxonomy to samples INFO 2025-05-16 20:54:08,908 instructlab.sdg.utils.taxonomy:143: Processing files... INFO 2025-05-16 20:54:08,908 instructlab.sdg.utils.taxonomy:148: Pattern 'swifties.md' matched 1 files. INFO 2025-05-16 20:54:08,908 instructlab.sdg.utils.taxonomy:152: Processing file: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/preprocessed_2025-05-16T20_54_08/documents/knowledge_arts_music_fandom_swifties_ommcpoil/swifties.md INFO 2025-05-16 20:54:08,908 instructlab.sdg.utils.taxonomy:156: Added file path: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/preprocessed_2025-05-16T20_54_08/documents/knowledge_arts_music_fandom_swifties_ommcpoil/swifties.md INFO 2025-05-16 20:54:09,245 instructlab.sdg.utils.taxonomy:143: Processing files... INFO 2025-05-16 20:54:09,245 instructlab.sdg.utils.taxonomy:148: Pattern 'chickadee.md' matched 1 files. INFO 2025-05-16 20:54:09,245 instructlab.sdg.utils.taxonomy:152: Processing file: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/preprocessed_2025-05-16T20_54_08/documents/knowledge_science_animals_birds_black_capped_chickadee_qz1qhkkd/chickadee.md INFO 2025-05-16 20:54:09,245 instructlab.sdg.utils.taxonomy:156: Added file path: /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/preprocessed_2025-05-16T20_54_08/documents/knowledge_science_animals_birds_black_capped_chickadee_qz1qhkkd/chickadee.md INFO 2025-05-16 20:54:47,090 instructlab.sdg.utils.chunkers:144: Found the docling models INFO 2025-05-16 20:54:47,561 instructlab.sdg.utils.chunkers:249: Successfully loaded tokenizer from: /var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 INFO 2025-05-16 20:54:47,685 docling.document_converter:269: Going to convert document batch... INFO 2025-05-16 20:54:47,685 docling.document_converter:304: Initializing pipeline for SimplePipeline with options hash 4cc01982ae99b46a2a63fcda46c47c35 INFO 2025-05-16 20:54:47,685 docling.pipeline.base_pipeline:39: Processing document swifties.md INFO 2025-05-16 20:54:48,196 docling.document_converter:284: Finished converting document swifties.md in 0.51 sec. INFO 2025-05-16 20:54:48,406 instructlab.sdg.utils.chunkers:144: Found the docling models INFO 2025-05-16 20:54:48,779 instructlab.sdg.utils.chunkers:249: Successfully loaded tokenizer from: /var/home/cloud-user/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 INFO 2025-05-16 20:54:48,780 docling.document_converter:269: Going to convert document batch... INFO 2025-05-16 20:54:48,780 docling.document_converter:304: Initializing pipeline for SimplePipeline with options hash 4cc01982ae99b46a2a63fcda46c47c35 INFO 2025-05-16 20:54:48,780 docling.pipeline.base_pipeline:39: Processing document chickadee.md INFO 2025-05-16 20:54:50,130 docling.document_converter:284: Finished converting document chickadee.md in 1.35 sec. INFO 2025-05-16 20:54:50,230 instructlab.sdg.generate_data:405: Taxonomy converted to samples and written to /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/preprocessed_2025-05-16T20_54_08 INFO 2025-05-16 20:54:50,254 instructlab.sdg.generate_data:441: Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help. Generating train split: 125 examples [00:00, 5822.12 examples/s] INFO 2025-05-16 20:54:50,419 instructlab.sdg.checkpointing:64: Loading existing checkpoints from /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_inclusion, with 125 rows INFO 2025-05-16 20:54:50,429 instructlab.sdg.checkpointing:68: Found 1 missing rows in the dataset INFO 2025-05-16 20:54:50,429 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:54:53,153 instructlab.sdg.blocks.llmblock:56: LLM server supports batched inputs: True INFO 2025-05-16 20:54:53,154 instructlab.sdg.pipeline:199: Running block: gen_contexts INFO 2025-05-16 20:54:58,184 instructlab.sdg.pipeline:199: Running block: gen_grounded_questions INFO 2025-05-16 20:55:03,105 instructlab.sdg.pipeline:199: Running block: eval_grounded_questions INFO 2025-05-16 20:55:06,842 instructlab.sdg.pipeline:199: Running block: filter_grounded_questions Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 102.85 examples/s] Filter (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 137.92 examples/s] INFO 2025-05-16 20:55:07,920 instructlab.sdg.pipeline:199: Running block: gen_grounded_responses INFO 2025-05-16 20:55:11,175 instructlab.sdg.pipeline:199: Running block: evaluate_grounded_qa_pair INFO 2025-05-16 20:55:13,692 instructlab.sdg.pipeline:199: Running block: filter_grounded_qa_pair Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 102.91 examples/s] Filter (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 136.04 examples/s] INFO 2025-05-16 20:55:14,756 instructlab.sdg.pipeline:199: Running block: combine_question_and_context Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 96.24 examples/s] INFO 2025-05-16 20:55:15,345 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 20:55:17,508 instructlab.sdg.pipeline:199: Running block: icl_populator Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 89.03 examples/s] INFO 2025-05-16 20:55:18,123 instructlab.sdg.pipeline:199: Running block: analyzer INFO 2025-05-16 20:55:23,936 instructlab.sdg.pipeline:199: Running block: critic INFO 2025-05-16 20:55:31,450 instructlab.sdg.pipeline:199: Running block: planner INFO 2025-05-16 20:55:36,856 instructlab.sdg.pipeline:199: Running block: revised_responder INFO 2025-05-16 20:55:49,813 instructlab.sdg.pipeline:199: Running block: judge INFO 2025-05-16 20:55:56,246 instructlab.sdg.pipeline:199: Running block: filter_judgement Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 87.22 examples/s] Filter (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 135.65 examples/s] INFO 2025-05-16 20:55:57,389 instructlab.sdg.pipeline:199: Running block: response_selector Map (num_proc=8): 100%|##########| 30/30 [00:00<00:00, 44.65 examples/s] INFO 2025-05-16 20:55:58,347 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_inclusion/data_checkpoint_6c34d91841104e5aabf1424098c097ae.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 257.22ba/s] INFO 2025-05-16 20:55:58,394 instructlab.sdg.generate_data:478: Generated 155 samples Generating train split: 85 examples [00:00, 19786.65 examples/s] INFO 2025-05-16 20:55:58,527 instructlab.sdg.checkpointing:64: Loading existing checkpoints from /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_grounded_linguistics_writing_rewriting, with 85 rows INFO 2025-05-16 20:55:58,534 instructlab.sdg.checkpointing:68: Found 0 missing rows in the dataset INFO 2025-05-16 20:55:58,534 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:55:58,554 instructlab.sdg.generate_data:478: Generated 85 samples Generating train split: 70 examples [00:00, 26452.95 examples/s] INFO 2025-05-16 20:55:58,606 instructlab.sdg.checkpointing:64: Loading existing checkpoints from /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/compositional_skills_linguistics_synonyms, with 70 rows INFO 2025-05-16 20:55:58,612 instructlab.sdg.checkpointing:68: Found 0 missing rows in the dataset INFO 2025-05-16 20:55:58,612 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:55:58,625 instructlab.sdg.generate_data:478: Generated 70 samples Generating train split: 3152 examples [00:00, 84409.35 examples/s] INFO 2025-05-16 20:55:58,707 instructlab.sdg.checkpointing:64: Loading existing checkpoints from /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_arts_music_fandom_swifties, with 3152 rows INFO 2025-05-16 20:55:58,732 instructlab.sdg.checkpointing:68: Found 6 missing rows in the dataset INFO 2025-05-16 20:55:58,732 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:55:58,735 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 20:56:01,134 instructlab.sdg.pipeline:199: Running block: SetClassifierValue INFO 2025-05-16 20:56:01,146 instructlab.sdg.pipeline:199: Running block: duplicate_document_col INFO 2025-05-16 20:56:01,152 instructlab.sdg.pipeline:199: Running block: gen_detailed_summary INFO 2025-05-16 20:56:07,456 instructlab.sdg.pipeline:199: Running block: gen_atomic_facts INFO 2025-05-16 20:56:15,243 instructlab.sdg.pipeline:199: Running block: gen_extractive_summary INFO 2025-05-16 20:56:19,491 instructlab.sdg.pipeline:199: Running block: flatten_summary_columns INFO 2025-05-16 20:56:19,508 instructlab.sdg.pipeline:199: Running block: rename_to_document_column INFO 2025-05-16 20:56:19,522 instructlab.sdg.pipeline:199: Running block: knowledge generation INFO 2025-05-16 20:56:53,148 instructlab.sdg.pipeline:199: Running block: eval_faithfulness_qa_pair INFO 2025-05-16 20:57:06,798 instructlab.sdg.pipeline:199: Running block: filter_faithfulness Map (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 616.72 examples/s] Filter (num_proc=8): 100%|##########| 256/256 [00:00<00:00, 1020.65 examples/s] Map (num_proc=8): 100%|##########| 28/28 [00:00<00:00, 84.27 examples/s] Filter (num_proc=8): 100%|##########| 28/28 [00:00<00:00, 123.10 examples/s] INFO 2025-05-16 20:57:09,195 instructlab.sdg.pipeline:199: Running block: eval_relevancy_qa_pair INFO 2025-05-16 20:57:14,856 instructlab.sdg.pipeline:199: Running block: filter_relevancy Map (num_proc=8): 100%|##########| 194/194 [00:00<00:00, 491.74 examples/s] Filter (num_proc=8): 100%|##########| 194/194 [00:00<00:00, 814.75 examples/s] INFO 2025-05-16 20:57:16,088 instructlab.sdg.pipeline:199: Running block: eval_verify_question INFO 2025-05-16 20:57:22,217 instructlab.sdg.pipeline:199: Running block: filter_verify_question Map (num_proc=8): 100%|##########| 173/173 [00:00<00:00, 446.50 examples/s] Filter (num_proc=8): 100%|##########| 173/173 [00:00<00:00, 734.86 examples/s] INFO 2025-05-16 20:57:23,447 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_arts_music_fandom_swifties/data_checkpoint_9fb7728cd03d4ab0a5eb35a11f932c81.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 127.00ba/s] INFO 2025-05-16 20:57:24,083 instructlab.sdg.generate_data:478: Generated 3276 samples Generating train split: 2549 examples [00:00, 27916.54 examples/s] INFO 2025-05-16 20:57:24,244 instructlab.sdg.checkpointing:64: Loading existing checkpoints from /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_science_animals_birds_black_capped_chickadee, with 2549 rows INFO 2025-05-16 20:57:24,267 instructlab.sdg.checkpointing:68: Found 4 missing rows in the dataset INFO 2025-05-16 20:57:24,267 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:57:24,270 instructlab.sdg.pipeline:199: Running block: router INFO 2025-05-16 20:57:26,144 instructlab.sdg.pipeline:199: Running block: SetClassifierValue INFO 2025-05-16 20:57:26,156 instructlab.sdg.pipeline:199: Running block: duplicate_document_col INFO 2025-05-16 20:57:26,163 instructlab.sdg.pipeline:199: Running block: gen_detailed_summary INFO 2025-05-16 20:57:29,349 instructlab.sdg.pipeline:199: Running block: gen_atomic_facts INFO 2025-05-16 20:57:36,635 instructlab.sdg.pipeline:199: Running block: gen_extractive_summary INFO 2025-05-16 20:57:38,975 instructlab.sdg.pipeline:199: Running block: flatten_summary_columns INFO 2025-05-16 20:57:38,991 instructlab.sdg.pipeline:199: Running block: rename_to_document_column INFO 2025-05-16 20:57:39,004 instructlab.sdg.pipeline:199: Running block: knowledge generation INFO 2025-05-16 20:58:07,371 instructlab.sdg.pipeline:199: Running block: eval_faithfulness_qa_pair INFO 2025-05-16 20:58:12,957 instructlab.sdg.pipeline:199: Running block: filter_faithfulness Map (num_proc=8): 100%|##########| 130/130 [00:00<00:00, 345.86 examples/s] Filter (num_proc=8): 100%|##########| 130/130 [00:00<00:00, 538.15 examples/s] INFO 2025-05-16 20:58:14,269 instructlab.sdg.pipeline:199: Running block: eval_relevancy_qa_pair INFO 2025-05-16 20:58:17,000 instructlab.sdg.pipeline:199: Running block: filter_relevancy Map (num_proc=8): 100%|##########| 54/54 [00:00<00:00, 151.56 examples/s] Filter (num_proc=8): 100%|##########| 54/54 [00:00<00:00, 227.24 examples/s] INFO 2025-05-16 20:58:18,259 instructlab.sdg.pipeline:199: Running block: eval_verify_question INFO 2025-05-16 20:58:21,404 instructlab.sdg.pipeline:199: Running block: filter_verify_question Map (num_proc=8): 100%|##########| 52/52 [00:00<00:00, 143.99 examples/s] Filter (num_proc=8): 100%|##########| 52/52 [00:00<00:00, 218.14 examples/s] INFO 2025-05-16 20:58:22,679 instructlab.sdg.checkpointing:44: Saving checkpoint to /var/home/cloud-user/.local/share/instructlab/datasets/checkpoints/knowledge_science_animals_birds_black_capped_chickadee/data_checkpoint_33316355ed9c4795a9e084691de0b72e.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 218.77ba/s] INFO 2025-05-16 20:58:23,188 instructlab.sdg.generate_data:478: Generated 2599 samples INFO 2025-05-16 20:58:23,215 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:58:23,219 instructlab.sdg.pipeline:199: Running block: gen_mmlu_knowledge Filter: 100%|##########| 355/355 [00:00<00:00, 45323.81 examples/s] Filter: 100%|##########| 355/355 [00:00<00:00, 25613.30 examples/s] Flattening the indices: 100%|##########| 355/355 [00:00<00:00, 39431.63 examples/s] Map: 100%|##########| 355/355 [00:00<00:00, 11058.79 examples/s] Map: 100%|##########| 355/355 [00:00<00:00, 10256.15 examples/s] Map: 100%|##########| 355/355 [00:00<00:00, 10301.07 examples/s] Filter: 100%|##########| 355/355 [00:00<00:00, 39055.16 examples/s] Filter: 100%|##########| 355/355 [00:00<00:00, 20605.55 examples/s] Filter: 100%|##########| 353/353 [00:00<00:00, 20290.38 examples/s] Flattening the indices: 100%|##########| 353/353 [00:00<00:00, 37776.88 examples/s] Casting to class labels: 100%|##########| 353/353 [00:00<00:00, 10543.56 examples/s] INFO 2025-05-16 20:58:38,381 instructlab.sdg.eval_data:126: Saving MMLU Dataset /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/mmlubench_knowledge_arts_music_fandom_swifties.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 115.99ba/s] INFO 2025-05-16 20:58:38,390 instructlab.sdg.eval_data:130: Saving MMLU Task yaml /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/knowledge_arts_music_fandom_swifties_task.yaml INFO 2025-05-16 20:58:38,400 instructlab.sdg.pipeline:161: Running pipeline with multi-threaded batching. Using 2 workers for batches of size 256 INFO 2025-05-16 20:58:38,404 instructlab.sdg.pipeline:199: Running block: gen_mmlu_knowledge Filter: 100%|##########| 382/382 [00:00<00:00, 49746.15 examples/s] Filter: 100%|##########| 382/382 [00:00<00:00, 26542.27 examples/s] Flattening the indices: 100%|##########| 382/382 [00:00<00:00, 43644.25 examples/s] Map: 100%|##########| 382/382 [00:00<00:00, 11010.79 examples/s] Map: 100%|##########| 382/382 [00:00<00:00, 10253.45 examples/s] Map: 100%|##########| 382/382 [00:00<00:00, 10340.53 examples/s] Filter: 100%|##########| 382/382 [00:00<00:00, 39679.64 examples/s] Filter: 100%|##########| 382/382 [00:00<00:00, 20491.42 examples/s] Filter: 100%|##########| 375/375 [00:00<00:00, 20579.95 examples/s] Flattening the indices: 100%|##########| 375/375 [00:00<00:00, 36612.29 examples/s] Casting to class labels: 100%|##########| 375/375 [00:00<00:00, 10550.40 examples/s] INFO 2025-05-16 20:58:54,991 instructlab.sdg.eval_data:126: Saving MMLU Dataset /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/mmlubench_knowledge_science_animals_birds_black_capped_chickadee.jsonl Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 108.44ba/s] INFO 2025-05-16 20:58:55,001 instructlab.sdg.eval_data:130: Saving MMLU Task yaml /var/home/cloud-user/.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/knowledge_science_animals_birds_black_capped_chickadee_task.yaml Map (num_proc=8): 100%|##########| 155/155 [00:00<00:00, 344.08 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 69.68ba/s] Map (num_proc=8): 100%|##########| 85/85 [00:00<00:00, 214.06 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 125.58ba/s] Map (num_proc=8): 100%|##########| 70/70 [00:00<00:00, 199.30 examples/s] Creating json from Arrow format: 100%|##########| 1/1 [00:00<00:00, 210.94ba/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 8831.84 examples/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 32578.74 examples/s] Filter: 100%|##########| 3276/3276 [00:00<00:00, 60491.57 examples/s] Map: 100%|##########| 73/73 [00:00<00:00, 10435.01 examples/s] Map: 100%|##########| 73/73 [00:00<00:00, 17914.94 examples/s] Creating json from Arrow format: 100%|##########| 4/4 [00:00<00:00, 47.56ba/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 8871.92 examples/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 8734.35 examples/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 9000.04 examples/s] Map: 100%|##########| 3276/3276 [00:00<00:00, 32770.42 examples/s] Filter: 100%|##########| 3276/3276 [00:00<00:00, 60879.66 examples/s] Map: 100%|##########| 73/73 [00:00<00:00, 10366.12 examples/s] INFO 2025-05-16 20:59:11,016 instructlab.sdg.datamixing:774: Knowledge detected to be less than 3.00% of skills (1.68%), upsampling to: 11824 Creating json from Arrow format: 100%|##########| 7/7 [00:00<00:00, 25.63ba/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 8879.17 examples/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 32562.45 examples/s] Filter: 100%|##########| 2599/2599 [00:00<00:00, 59478.58 examples/s] Map: 100%|##########| 66/66 [00:00<00:00, 10450.92 examples/s] Map: 100%|##########| 66/66 [00:00<00:00, 17161.00 examples/s] Creating json from Arrow format: 100%|##########| 3/3 [00:00<00:00, 45.44ba/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 8889.64 examples/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 9018.61 examples/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 8960.69 examples/s] Map: 100%|##########| 2599/2599 [00:00<00:00, 32301.45 examples/s] Filter: 100%|##########| 2599/2599 [00:00<00:00, 59526.97 examples/s] Map: 100%|##########| 66/66 [00:00<00:00, 10346.24 examples/s] INFO 2025-05-16 20:59:13,015 instructlab.sdg.datamixing:774: Knowledge detected to be less than 3.00% of skills (1.34%), upsampling to: 11824 Creating json from Arrow format: 100%|##########| 6/6 [00:00<00:00, 29.20ba/s] INFO 2025-05-16 20:59:14,023 instructlab.sdg.datamixing:158: Loading dataset from /usr/share/instructlab/sdg/datasets/skills.jsonl ... Generating train split: 301205 examples [02:03, 2432.96 examples/s] INFO 2025-05-16 21:01:23,091 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... failed to generate data with exception: An error occurred while generating the dataset real 11m46.468s user 0m0.351s sys 0m0.239s [cloud-user@mdepaulo-v15-7-prod-amd ~]$ find . | grep yaml ./.config/instructlab/config.yaml.lock ./.config/instructlab/config.yaml ./.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/knowledge_arts_music_fandom_swifties_task.yaml ./.local/share/instructlab/datasets/2025-05-16_191556/node_datasets_2025-05-16T19_18_39/knowledge_science_animals_birds_black_capped_chickadee_task.yaml ./.local/share/instructlab/datasets/2025-05-16_191556/knowledge_recipe_2025-05-16T19_18_39.yaml ./.local/share/instructlab/datasets/2025-05-16_191556/skills_recipe_2025-05-16T19_18_39.yaml ./.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/knowledge_arts_music_fandom_swifties_task.yaml ./.local/share/instructlab/datasets/2025-05-16_204950/node_datasets_2025-05-16T20_54_08/knowledge_science_animals_birds_black_capped_chickadee_task.yaml ./.local/share/instructlab/datasets/2025-05-16_204950/knowledge_recipe_2025-05-16T20_54_08.yaml ./.local/share/instructlab/datasets/2025-05-16_204950/skills_recipe_2025-05-16T20_54_08.yaml ./.local/share/instructlab/internal/train_configuration/additional/additional_args.yaml ./.local/share/instructlab/internal/system_profiles/amd/mi300x/mi300x_x4.yaml ./.local/share/instructlab/internal/system_profiles/amd/mi300x/mi300x_x2.yaml ./.local/share/instructlab/internal/system_profiles/amd/mi300x/mi300x_x8.yaml ./.local/share/instructlab/taxonomy/.markdownlint-cli2.yaml ./.local/share/instructlab/taxonomy/compositional_skills/grounded/linguistics/inclusion/qna.yaml ./.local/share/instructlab/taxonomy/compositional_skills/grounded/linguistics/writing/rewriting/qna.yaml ./.local/share/instructlab/taxonomy/compositional_skills/linguistics/synonyms/qna.yaml ./.local/share/instructlab/taxonomy/docs/template_qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/common_sense_reasoning/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/linguistics_reasoning/logical_sequence_of_words/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/linguistics_reasoning/object_identification/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/linguistics_reasoning/odd_one_out/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/logical_reasoning/causal/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/logical_reasoning/general/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/logical_reasoning/tabular/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/mathematical_reasoning/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/temporal_reasoning/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/theory_of_mind/qna.yaml ./.local/share/instructlab/taxonomy/foundational_skills/reasoning/unconventional_reasoning/lower_score_wins/qna.yaml ./.local/share/instructlab/taxonomy/knowledge/arts/music/fandom/swifties/qna.yaml ./.local/share/instructlab/taxonomy/knowledge/science/animals/birds/black_capped_chickadee/qna.yaml ./.local/share/instructlab/taxonomy/scripts/check-yaml.py [cloud-user@mdepaulo-v15-7-prod-amd ~]$ cat df 0^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ df -hl Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 882G 168K 882G 1% /dev/shm tmpfs 353G 2.6M 353G 1% /run /dev/vda4 249G 244G 5.1G 98% /sysroot overlay 39M 39M 0 100% / tmpfs 882G 20K 882G 1% /tmp /dev/vda3 960M 101M 860M 11% /boot /dev/vda2 501M 7.1M 494M 2% /boot/efi tmpfs 177G 28K 177G 1% /run/user/1000 [cloud-user@mdepaulo-v15-7-prod-amd ~]$ time ilab data generate | tee iso-testrun/ilab-data-generate-8-gpufix2 INFO 2025-05-16 21:05:45,352 instructlab.process.process:300: Started subprocess with PID 1. Logs are being written to /var/home/cloud-user/.local/share/instructlab/logs/generation/generation-88b6617e-3299-11f0-9046-0200048919a9.log. INFO 2025-05-16 21:05:49,226 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-05-16 21:05:50,652 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:43777/v1 INFO 2025-05-16 21:05:50,652 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:43777/v1 INFO 2025-05-16 21:05:50,652 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 1/120 INFO 2025-05-16 21:05:54,070 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 2/120 INFO 2025-05-16 21:05:57,480 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 3/120 INFO 2025-05-16 21:06:00,935 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 4/120 INFO 2025-05-16 21:06:04,421 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 5/120 INFO 2025-05-16 21:06:07,764 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 6/120 INFO 2025-05-16 21:06:10,983 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 7/120 INFO 2025-05-16 21:06:14,240 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 8/120 INFO 2025-05-16 21:06:17,683 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 9/120 INFO 2025-05-16 21:06:21,028 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 10/120 INFO 2025-05-16 21:06:24,291 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 11/120 INFO 2025-05-16 21:06:27,647 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 12/120 INFO 2025-05-16 21:06:30,916 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 13/120 INFO 2025-05-16 21:06:34,333 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 14/120 INFO 2025-05-16 21:06:37,756 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 15/120 INFO 2025-05-16 21:06:41,091 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 16/120 INFO 2025-05-16 21:06:44,487 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 17/120 INFO 2025-05-16 21:06:47,834 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:43777/v1, this might take a moment... Attempt: 18/120 ^C Aborted! ^C real 1m7.328s user 0m0.095s sys 0m0.066s [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo mkfs.xfs -L ilab-data /dev^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ ps -ef | grep ilab cloud-u+ 49975 31553 0 21:43 pts/0 00:00:00 grep --color=auto ilab [cloud-user@mdepaulo-v15-7-prod-amd ~]$ df -hl Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 882G 168K 882G 1% /dev/shm tmpfs 353G 2.6M 353G 1% /run /dev/vda4 249G 39G 211G 16% /sysroot overlay 39M 39M 0 100% / tmpfs 882G 24K 882G 1% /tmp /dev/vda3 960M 101M 860M 11% /boot /dev/vda2 501M 7.1M 494M 2% /boot/efi tmpfs 177G 3.3M 177G 1% /run/user/1000 /dev/nvme1n1 3.0T 227G 2.7T 8% /var/home/cloud-user/.cache [cloud-user@mdepaulo-v15-7-prod-amd ~]$ cat /etc/profile # /etc/profile # System wide environment and startup programs, for login setup # Functions and aliases go in /etc/bashrc # It's NOT a good idea to change this file unless you know what you # are doing. It's much better to create a custom.sh shell script in # /etc/profile.d/ to make custom changes to your environment, as this # will prevent the need for merging in future updates. pathmunge () { case ":${PATH}:" in *:"$1":*) ;; *) if [ "$2" = "after" ] ; then PATH=$PATH:$1 else PATH=$1:$PATH fi esac } if [ -x /usr/bin/id ]; then if [ -z "$EUID" ]; then # ksh workaround EUID=`/usr/bin/id -u` UID=`/usr/bin/id -ru` fi USER="`/usr/bin/id -un`" LOGNAME=$USER MAIL="/var/spool/mail/$USER" fi # Path manipulation if [ "$EUID" = "0" ]; then pathmunge /usr/sbin pathmunge /usr/local/sbin else pathmunge /usr/local/sbin after pathmunge /usr/sbin after fi HOSTNAME=$(/usr/bin/hostnamectl --transient 2>/dev/null) || \ HOSTNAME=$(/usr/bin/hostname 2>/dev/null) || \ HOSTNAME=$(/usr/bin/uname -n) HISTSIZE=1000 if [ "$HISTCONTROL" = "ignorespace" ] ; then export HISTCONTROL=ignoreboth else export HISTCONTROL=ignoredups fi export PATH USER LOGNAME MAIL HOSTNAME HISTSIZE HISTCONTROL for i in /etc/profile.d/*.sh /etc/profile.d/sh.local ; do if [ -r "$i" ]; then if [ "${-#*i}" != "$-" ]; then . "$i" else . "$i" >/dev/null fi fi done unset i unset -f pathmunge if [ -n "${BASH_VERSION-}" ] ; then if [ -f /etc/bashrc ] ; then # Bash login shells run only /etc/profile # Bash non-login shells run only /etc/bashrc # Check for double sourcing is done in /etc/bashrc. . /etc/bashrc fi fi [cloud-user@mdepaulo-v15-7-prod-amd ~]$ umask 0022 [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo vim /etc/dnf/ aliases.d/ dnf.conf modules.d/ modules.defaults.d/ plugins/ protected.d/ vars/ [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo vim /etc/dnf/ aliases.d/ dnf.conf modules.d/ modules.defaults.d/ plugins/ protected.d/ vars/ [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo vim /etc/dnf/dnf.conf sudo: vim: command not found [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo vi /etc/dnf/dnf.conf [cloud-user@mdepaulo-v15-7-prod-amd ~]$ ls -latr total 28 -rw-r--r--. 1 cloud-user cloud-user 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 24 May 16 17:37 .. drwx------. 2 cloud-user cloud-user 29 May 16 17:37 .ssh drwxr-xr-x. 3 cloud-user cloud-user 19 May 16 19:04 .triton drwxr-xr-x. 6 cloud-user cloud-user 66 May 16 19:07 .config drwx------. 3 cloud-user cloud-user 43 May 16 19:13 .local drwxr-xr-x. 2 cloud-user cloud-user 151 May 16 21:05 iso-testrun-orig -rw-r--r--. 1 cloud-user cloud-user 2368 May 16 21:16 EL_AI_test_1.5.sh drwxr-xr-x. 5 cloud-user cloud-user 58 May 16 21:42 .cache -rw-r--r--. 1 cloud-user cloud-user 141 May 16 21:42 .bash_profile -rw-------. 1 cloud-user cloud-user 6657 May 16 21:42 .bash_history drwxr-xr-x. 2 cloud-user cloud-user 80 May 16 21:47 iso-testrun.surprisingly-quick-because-data-existed drwx------. 10 cloud-user cloud-user 4096 May 16 22:07 . drwxr-xr-x. 2 cloud-user cloud-user 80 May 16 22:09 iso-testrun [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo dnf repolist Updating Subscription Management repositories. Repository fast-datapath-for-rhel-9-x86_64-rpms is listed more than once in the configuration repo id repo name codeready-builder-for-rhel-9-x86_64-eus-rpms Red Hat CodeReady Linux Builder for RHEL 9 x86_64 - Extended Update Support (RPMs) rhel-9-for-x86_64-appstream-eus-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream - Extended Update Support (RPMs) rhel-9-for-x86_64-appstream-rpms Red Hat Enterprise Linux 9 for x86_64 - AppStream (RPMs) rhel-9-for-x86_64-baseos-eus-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS - Extended Update Support (RPMs) rhel-9-for-x86_64-baseos-rpms Red Hat Enterprise Linux 9 for x86_64 - BaseOS (RPMs) [cloud-user@mdepaulo-v15-7-prod-amd ~]$ man rhc -bash: man: command not found [cloud-user@mdepaulo-v15-7-prod-amd ~]$ rhc --help NAME: rhc - control the system's connection to Red Hat USAGE: rhc [global options] command [command options] [arguments...] VERSION: 0.2.4 DESCRIPTION: The rhc command controls the system's connection to Red Hat. To connect the system using an activation key: rhc connect --organization ID --activation-key KEY To connect the system using a username and password: rhc connect --username USERNAME --password PASSWORD To disconnect the system: rhc disconnect Run 'rhc command --help' for more details. COMMANDS: connect Connects the system to Red Hat disconnect Disconnects the system from Red Hat status Prints status of the system's connection to Red Hat help, h Shows a list of commands or help for one command GLOBAL OPTIONS: --no-color (default: false) [$NO_COLOR] --help, -h show help (default: false) --version, -v print the version (default: false) [cloud-user@mdepaulo-v15-7-prod-amd ~]$ rhc connect --help NAME: rhc connect - Connects the system to Red Hat USAGE: rhc connect [command options] DESCRIPTION: The connect command connects the system to Red Hat Subscription Management, Red Hat Insights and Red Hat and activates the Remote Host Configuration daemon that enables Red Hat to interact with the system. For details visit: https://red.ht/connector OPTIONS: --username USERNAME, -u USERNAME register with USERNAME --password PASSWORD, -p PASSWORD register with PASSWORD --organization ID, -o ID register with ID --activation-key KEY, -a KEY register with KEY --help, -h show help (default: false) [cloud-user@mdepaulo-v15-7-prod-amd ~]$ df -hl Filesystem Size Used Avail Use% Mounted on devtmpfs 4.0M 0 4.0M 0% /dev tmpfs 882G 168K 882G 1% /dev/shm tmpfs 353G 2.6M 353G 1% /run /dev/vda4 249G 38G 211G 16% /sysroot overlay 39M 39M 0 100% / tmpfs 882G 24K 882G 1% /tmp /dev/vda3 960M 101M 860M 11% /boot /dev/vda2 501M 7.1M 494M 2% /boot/efi tmpfs 177G 3.3M 177G 1% /run/user/1000 /dev/nvme1n1 3.0T 247G 2.7T 9% /var/home/cloud-user/.cache [cloud-user@mdepaulo-v15-7-prod-amd ~]$ du -sh .data du: cannot access '.data': No such file or directory [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh .local 204M .local [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh .local 204M .local [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh .* 226G . 0 .. [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh ls^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ ls EL_AI_test_1.5.sh iso-testrun iso-testrun-orig iso-testrun.surprisingly-quick-because-data-existed [cloud-user@mdepaulo-v15-7-prod-amd ~]$ ls -latrh total 28K -rw-r--r--. 1 cloud-user cloud-user 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 24 May 16 17:37 .. drwx------. 2 cloud-user cloud-user 29 May 16 17:37 .ssh drwxr-xr-x. 3 cloud-user cloud-user 19 May 16 19:04 .triton drwxr-xr-x. 6 cloud-user cloud-user 66 May 16 19:07 .config drwx------. 3 cloud-user cloud-user 43 May 16 19:13 .local drwxr-xr-x. 2 cloud-user cloud-user 151 May 16 21:05 iso-testrun-orig -rw-r--r--. 1 cloud-user cloud-user 2.4K May 16 21:16 EL_AI_test_1.5.sh drwxr-xr-x. 5 cloud-user cloud-user 58 May 16 21:42 .cache -rw-r--r--. 1 cloud-user cloud-user 141 May 16 21:42 .bash_profile -rw-------. 1 cloud-user cloud-user 6.6K May 16 21:42 .bash_history drwxr-xr-x. 2 cloud-user cloud-user 80 May 16 21:47 iso-testrun.surprisingly-quick-because-data-existed drwx------. 10 cloud-user cloud-user 4.0K May 16 22:07 . drwxr-xr-x. 2 cloud-user cloud-user 80 May 16 22:09 iso-testrun [cloud-user@mdepaulo-v15-7-prod-amd ~]$ ps -ef | grep ilab cloud-u+ 56428 55215 0 22:09 pts/1 00:00:02 podman run --rm -it --device /dev/kfd --device /dev/dri --security-opt label=disable --net host --shm-size 10G --pids-limit -1 -v /var/home/cloud-user:/var/home/cloud-user -v /run/user/1000/containers/auth.json:/run/containers/0/auth.json --env HF_TOKEN --env HOME --env NCCL_DEBUG --env VLLM_LOGGING_LEVEL --entrypoint ilab registry.redhat.io/rhelai1/instructlab-amd-rhel9:1.5.0 data generate cloud-u+ 56429 55215 0 22:09 pts/1 00:00:00 tee iso-testrun/ilab-data-generate cloud-u+ 56463 56461 3 22:09 pts/0 00:01:54 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69663 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69664 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69665 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69666 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69667 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69668 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69670 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69672 56463 23 23:10 pts/0 00:00:03 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69676 56463 1 23:10 pts/0 00:00:00 /opt/app-root/bin/python3.11 /opt/app-root/bin/ilab data generate cloud-u+ 69699 31553 0 23:10 pts/0 00:00:00 grep --color=auto ilab [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo stat ~/is^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ stat iso-testrun/ilab-data-generate File: iso-testrun/ilab-data-generate Size: 117856 Blocks: 256 IO Block: 4096 regular file Device: fc04h/64516d Inode: 335545603 Links: 1 Access: (0644/-rw-r--r--) Uid: ( 1000/cloud-user) Gid: ( 1000/cloud-user) Context: unconfined_u:object_r:user_home_t:s0 Access: 2025-05-16 22:09:09.208850483 +0000 Modify: 2025-05-16 23:10:32.803864226 +0000 Change: 2025-05-16 23:10:32.803864226 +0000 Birth: 2025-05-16 22:09:09.208850483 +0000 [cloud-user@mdepaulo-v15-7-prod-amd ~]$ du -sh .local du: cannot read directory '.local/share/containers/storage/overlay/98c8240177bd8a6b73d5905378898ac7041bc83b78fd507d4fe48e57440e5b00/work': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/98c8240177bd8a6b73d5905378898ac7041bc83b78fd507d4fe48e57440e5b00/merged': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/9cb605b6f1f7b077e2748d722186c245084395342abf99203bbf9714b74e70e8/work': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/9cb605b6f1f7b077e2748d722186c245084395342abf99203bbf9714b74e70e8/merged': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/6eedafc59ab17fa9036bd7425e11d439d4c4781f090725357f3f92468cdf0eee/work': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/6eedafc59ab17fa9036bd7425e11d439d4c4781f090725357f3f92468cdf0eee/merged': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/3ca1c90b09f45d5bbc933f7dd8f193baa31be5efa46f4308d3d71193f3304a6e/work': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/3ca1c90b09f45d5bbc933f7dd8f193baa31be5efa46f4308d3d71193f3304a6e/merged': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/5098d9f69b7a3c6216a0e0afe8e2de94e7528371a3b9eaf998e9a50f5eb9ac54/work': Permission denied du: cannot read directory '.local/share/containers/storage/overlay/5098d9f69b7a3c6216a0e0afe8e2de94e7528371a3b9eaf998e9a50f5eb9ac54/merged': Permission denied 7.5G .local [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh .local 7.5G .local [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo find .loca^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ cd .lo^C [cloud-user@mdepaulo-v15-7-prod-amd ~]$ sudo du -sh .local 7.6G .local [cloud-user@mdepaulo-v15-7-prod-amd ~]$ nvtop [cloud-user@mdepaulo-v15-7-prod-amd ~]$ nvtop [cloud-user@mdepaulo-v15-7-prod-amd ~]$ Read from remote host 169.63.187.52: Connection timed out Connection to 169.63.187.52 closed. client_loop: send disconnect: Broken pipe mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh mikedep333-ibm-us-east ssh: connect to host 169.63.187.52 port 22: Connection timed out mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh mikedep333-ibm-us-east ^C mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh 169.63.187.52 ssh: connect to host 169.63.187.52 port 22: Connection timed out mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh 169.63.187.52 ssh: connect to host 169.63.187.52 port 22: Connection timed out mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ sh 150.240.3.148 sh: 150.240.3.148: No such file or directory mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ sh 150.240.3.148 sh: 150.240.3.148: No such file or directory mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh 150.240.3.148 @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the ED25519 key sent by the remote host is SHA256:aFYfaC7m+szooyVrhZ+3GaEyltY/ETUfkzPUmHePPTM. Please contact your system administrator. Add correct host key in /home/mdepaulo/.ssh/known_hosts to get rid of this message. Offending ECDSA key in /home/mdepaulo/.ssh/known_hosts:246 Host key for 150.240.3.148 has changed and you have requested strict checking. Host key verification failed. mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh-keygen -R 150.240.3.148 # Host 150.240.3.148 found: line 244 # Host 150.240.3.148 found: line 245 # Host 150.240.3.148 found: line 246 /home/mdepaulo/.ssh/known_hosts updated. Original contents retained as /home/mdepaulo/.ssh/known_hosts.old mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh 150.240.3.148 The authenticity of host '150.240.3.148 (150.240.3.148)' can't be established. ED25519 key fingerprint is SHA256:aFYfaC7m+szooyVrhZ+3GaEyltY/ETUfkzPUmHePPTM. This key is not known by any other names. Are you sure you want to continue connecting (yes/no/[fingerprint])? yes Warning: Permanently added '150.240.3.148' (ED25519) to the list of known hosts. mdepaulo@150.240.3.148: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh cloud-user2@150.240.3.148 cloud-user2@150.240.3.148: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh cloud-user@150.240.3.148 Register this system with Red Hat Insights: insights-client --register Create an account or view all your systems at https://red.ht/insights-dashboard [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cd .config/containers/^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/u udev/ udisks2/ user/ utmp [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/u udev/ udisks2/ user/ utmp [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/user/1000/ bus systemd/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/user/1000/ bus systemd/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/user/1000/ bus systemd/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s /var/run/user/1000/co^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo mkdir /var/run/user/1000/containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rmdir /var/run/user/1000/containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ mkdir /var/run/user/1000/containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ln -s ^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo cp ^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cp ~/.config/containers/auth.json /var/run/user/1000/containers/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vim EL_AI_test_1.5.sh -bash: vim: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vi EL_AI_test_1.5.sh [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version This host is not connected to Red Hat Insights. To connect this host to Red Hat Insights run the following command: sudo rhc connect --organization --activation-key To generate an Activation Key: https://console.redhat.com/insights/connector/activation-keys (this page will also display your Organization ID). For more information on Red Hat Insights, please visit: https://docs.redhat.com/en/documentation/subscription_central/1-latest/html/getting_started_with_activation_keys_on_the_hybrid_cloud_console/assembly-creating-managing-activation-keys [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rhc connect --organization 11009103 --activation-key mdepaulo-rhelai-qe error: non-root user cannot connect system [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rhc connect --organization 11009103 --activation-key mdepaulo-rhelai-qe Connecting mdepaulo-v157-amd-prod-2 to Red Hat. This might take a few seconds. ● Connected to Red Hat Subscription Management ● Connected to Red Hat Insights ● Activated the Remote Host Configuration daemon ● Enabled console.redhat.com services: remote configuration, insights, remediations, compliance Successfully connected to Red Hat! Manage your connected systems: https://red.ht/connector [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun + ilab config init ---------------------------------------------------- Welcome to the InstructLab CLI This guide will help you to setup your environment ---------------------------------------------------- Please provide the following values to initiate the environment [press 'Enter' for default options when prompted] Cloning https://github.com/instructlab/taxonomy.git... Generating config file: /var/home/cloud-user/.config/instructlab/config.yaml Please choose a system profile. Profiles set hardware-specific defaults for all commands and sections of the configuration. First, please select the hardware vendor your system falls into [0] NO SYSTEM PROFILE [1] AMD Enter the number of your choice [0]: 1 You selected: AMD Next, please select the specific hardware configuration that most closely matches your system. [0] NO SYSTEM PROFILE [1] AMD MI300X X4 [2] AMD MI300X X2 [3] AMD MI300X X8 Enter the number of your choice [hit enter for hardware defaults] [0]: 3 You selected: /var/home/cloud-user/.local/share/instructlab/internal/system_profiles/amd/mi300x/mi300x_x8.yaml -------------------------------------------- Initialization completed successfully! You're ready to start using `ilab`. Enjoy! -------------------------------------------- + sed -i 's/gpus: 1/gpus: 8/g' /var/home/cloud-user/.config/instructlab/config.yaml + ilab config show + ilab system info + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3 --release 1.5 INFO 2025-05-19 14:14:32,583 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3@1.5 Destination: /var/home/cloud-user/.cache/instructlab/models Copying blob 4452b845ab9c done | Copying blob 01f47425d010 done | Copying blob cfc7749b96f6 done | Copying blob cd99f66c98e5 done | Copying blob 6f4761a5ce47 done | Copying blob 488e082ff0d1 done | Copying blob d8d4489231c6 done | Copying blob 5d44fdf2d36d done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:14:42,310 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:14:42,310 instructlab.model.download:302: Available models (`ilab model list`): +------------+---------------+------+---------------+ | Model Name | Last Modified | Size | Absolute path | +------------+---------------+------+---------------+ +------------+---------------+------+---------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3 --release 1.5 INFO 2025-05-19 14:14:48,845 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3@1.5 Destination: /var/home/cloud-user/.cache/instructlab/models Copying blob e84e60569620 done | Copying blob 4d0d6bb4d9d0 done | Copying blob 82d96d7a9e6c done | Copying blob c4334cbcdf17 done | Copying blob 488e082ff0d1 done | Copying blob cfc7749b96f6 done | Copying blob 0f17dc4a3b97 done | Copying blob d2313c03a149 done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:14:54,821 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:14:54,821 instructlab.model.download:302: Available models (`ilab model list`): +------------+---------------+------+---------------+ | Model Name | Last Modified | Size | Absolute path | +------------+---------------+------+---------------+ +------------+---------------+------+---------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2 --release 1.5 INFO 2025-05-19 14:15:01,263 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2@1.5 Destination: /var/home/cloud-user/.cache/instructlab/models Copying blob db47a10e7df0 [=========>----------------------------] 1.2GiB / 4.6GiB | 754.3 MiB/s Copying blob 303127a244b0 done | Copying blob 081bdeaf76fc done | Copying blob 3f4905316ed0 done | Copying blob 694fca9fdcbf [=========>----------------------------] 1.2GiB / 4.6GiB | 541.4 MiB/s Copying blob cfeca4972fa9 [=========>----------------------------] 1.2GiB / 4.6GiB | 544.5 MiB/s Copying blob 6ff9f3935185 [=========================>------------] 1.1GiB / 1.7GiB | 416.2 MiB/s Copying blob ac7915d4e17b done | Copying blob 05db54e4d322 done | Copying blob ed944c0b3d71 done | Copying blob 66a07f75fd8d done | Copying blob 80ab859339a2 done | ^C Aborted! ^C^C^C^C^C^C^C^C^C^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS loop0 7:0 0 38.5M 1 loop zram0 251:0 0 8G 0 disk [SWAP] vda 252:0 0 250G 0 disk ├─vda1 252:1 0 1M 0 part ├─vda2 252:2 0 501M 0 part /boot/efi ├─vda3 252:3 0 1G 0 part /boot └─vda4 252:4 0 248.5G 0 part /var /sysroot/ostree/deploy/default/var /etc /sysroot vdb 252:16 0 366K 0 disk vdc 252:32 0 44K 0 disk vdd 252:48 0 1000G 0 disk nvme7n1 259:0 0 2.9T 0 disk nvme6n1 259:1 0 2.9T 0 disk nvme2n1 259:2 0 2.9T 0 disk nvme0n1 259:3 0 2.9T 0 disk nvme1n1 259:4 0 2.9T 0 disk nvme3n1 259:5 0 2.9T 0 disk nvme4n1 259:6 0 2.9T 0 disk nvme5n1 259:7 0 2.9T 0 disk [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo mkdir^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr ^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo sgdisk -n 1:0:0 /dev/vdd Creating new GPT entries in memory. The operation has completed successfully. [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ mkfs.xfs -L ilab-data /dev/vdd mkfs.xfs: cannot open /dev/vdd: Permission denied [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo mkfs.xfs -L ilab-data /dev/vdd mkfs.xfs: /dev/vdd appears to contain a partition table (gpt). mkfs.xfs: Use the -f option to force overwrite. [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo mkfs.xfs -L ilab-data /dev/vdd1 meta-data=/dev/vdd1 isize=512 agcount=4, agsize=65535935 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=4096 blocks=262143739, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=127999, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 Discarding blocks...Done. [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ echo LABEL=ilab-data /mnt xfs defaults 0 0 | sudo tee -a /etc/fstab LABEL=ilab-data /mnt xfs defaults 0 0 [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo systemctl daemon-reload [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo mount -a [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo chmod 1777 /mnt [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ mkdir -p /mnt/.config/containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cp ~/.config/containers/ auth.json storage.conf [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cp ~/.config/containers/storage.conf /mnt/.config/containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vim ~/.bash_profile -bash: vim: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ :$ -bash: :$: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vi ~/.bash_profile [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vi .bashrc [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vim .bash_profile -bash: vim: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vim .bash_profile -bash: vim: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vi .bash_profile [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash -i [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ exit exit [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo chmod 1777 /mnt^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun mkdir: cannot create directory ‘iso-testrun’: File exists [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf iso-testrun/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh ^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr total 20 -rw-r--r--. 1 cloud-user cloud-user 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 24 May 19 12:56 .. drwx------. 2 cloud-user cloud-user 29 May 19 12:56 .ssh -rw-r--r--. 1 cloud-user cloud-user 3566 May 19 14:09 EL_AI_test_1.5.sh drwx------. 3 cloud-user cloud-user 19 May 19 14:09 .local drwxr-xr-x. 5 cloud-user cloud-user 54 May 19 14:11 .config drwxr-xr-x. 3 cloud-user cloud-user 25 May 19 14:11 .cache -rw-r--r--. 1 cloud-user cloud-user 168 May 19 14:22 .bash_profile -rw-------. 1 cloud-user cloud-user 10 May 19 14:23 .bash_history drwx------. 6 cloud-user cloud-user 163 May 19 14:24 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .cache/instructlab/ models/ oci/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .cache/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .local/share/ containers/ instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .local/share/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ mkdir .cache [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rmdir .ca^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latrZ total 20 -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root system_u:object_r:home_root_t:s0 24 May 19 12:56 .. drwx------. 2 cloud-user cloud-user system_u:object_r:ssh_home_t:s0 29 May 19 12:56 .ssh -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 3566 May 19 14:09 EL_AI_test_1.5.sh drwxr-xr-x. 5 cloud-user cloud-user unconfined_u:object_r:config_home_t:s0 54 May 19 14:11 .config -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 168 May 19 14:22 .bash_profile -rw-------. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 10 May 19 14:23 .bash_history drwx------. 2 cloud-user cloud-user unconfined_u:object_r:gconf_home_t:s0 6 May 19 14:24 .local drwxr-xr-x. 2 cloud-user cloud-user unconfined_u:object_r:cache_home_t:s0 6 May 19 14:24 .cache drwx------. 6 cloud-user cloud-user unconfined_u:object_r:user_home_dir_t:s0 163 May 19 14:24 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo restorecon -F .cache [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr total 20 -rw-r--r--. 1 cloud-user cloud-user 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 24 May 19 12:56 .. drwx------. 2 cloud-user cloud-user 29 May 19 12:56 .ssh -rw-r--r--. 1 cloud-user cloud-user 3566 May 19 14:09 EL_AI_test_1.5.sh drwxr-xr-x. 5 cloud-user cloud-user 54 May 19 14:11 .config -rw-r--r--. 1 cloud-user cloud-user 168 May 19 14:22 .bash_profile -rw-------. 1 cloud-user cloud-user 10 May 19 14:23 .bash_history drwx------. 2 cloud-user cloud-user 6 May 19 14:24 .local drwxr-xr-x. 2 cloud-user cloud-user 6 May 19 14:24 .cache drwx------. 6 cloud-user cloud-user 163 May 19 14:24 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ls -latrZ total 20 -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root system_u:object_r:home_root_t:s0 24 May 19 12:56 .. drwx------. 2 cloud-user cloud-user system_u:object_r:ssh_home_t:s0 29 May 19 12:56 .ssh -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 3566 May 19 14:09 EL_AI_test_1.5.sh drwxr-xr-x. 5 cloud-user cloud-user unconfined_u:object_r:config_home_t:s0 54 May 19 14:11 .config -rw-r--r--. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 168 May 19 14:22 .bash_profile -rw-------. 1 cloud-user cloud-user unconfined_u:object_r:user_home_t:s0 10 May 19 14:23 .bash_history drwx------. 2 cloud-user cloud-user unconfined_u:object_r:gconf_home_t:s0 6 May 19 14:24 .local drwxr-xr-x. 2 cloud-user cloud-user unconfined_u:object_r:cache_home_t:s0 6 May 19 14:24 .cache drwx------. 6 cloud-user cloud-user unconfined_u:object_r:user_home_dir_t:s0 163 May 19 14:24 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun + ilab config init Existing config file was found in: /var/home/cloud-user/.config/instructlab/config.yaml Do you still want to continue? [y/N]: ^CAborted! [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm .config/ cni/ containers/ instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm .config/ins^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/c cni/ containers/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/cni/net.d/cni.lock .bash_history .bash_profile .cache/ .local/ EL_AI_test_1.5.sh .bash_logout .bashrc .config/ .ssh/ iso-testrun/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/cni/net.d/cni.lock .bash_history .bash_profile .cache/ .local/ EL_AI_test_1.5.sh .bash_logout .bashrc .config/ .ssh/ iso-testrun/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/containers/ auth.json storage.conf [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/containers/ auth.json storage.conf [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/containers/storage.conf ^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ /etc/skel/.config/containers/storage.conf /mnt/.config/containers/storage.conf -bash: /etc/skel/.config/containers/storage.conf: Permission denied [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ^Ctc/skel/.config/containers/storage.conf /mnt/.config/containers/storage.conf [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo chmod o^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ umask 0022 [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr /etc/skel/ total 24 drwxr-xr-x. 3 root root 24 May 16 05:38 .config -rw-r--r--. 1 root root 492 May 16 05:38 .bashrc -rw-r--r--. 1 root root 141 May 16 05:38 .bash_profile -rw-r--r--. 1 root root 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 77 May 16 05:38 . drwxr-xr-x. 90 root root 8192 May 19 12:56 .. [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr /etc/skel/.config/ total 0 drwxr-xr-x. 2 root root 26 May 16 05:38 containers drwxr-xr-x. 3 root root 77 May 16 05:38 .. drwxr-xr-x. 3 root root 24 May 16 05:38 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr /etc/skel/.config/containers/ total 4 -rw-r--r--. 1 root root 330 May 16 05:38 storage.conf drwxr-xr-x. 3 root root 24 May 16 05:38 .. drwxr-xr-x. 2 root root 26 May 16 05:38 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo chown cloud-user /mnt/.^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ls -latr /mnt.config ls: cannot access '/mnt.config': No such file or directory [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ls -latr /mnt/ total 4 drwxr-xr-x. 24 root root 4096 May 19 12:56 .. drwxrwxrwt. 3 root root 21 May 19 14:21 . drwxr-xr-x. 3 cloud-user cloud-user 24 May 19 14:21 .config [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ls -latr /mnt/.config/ total 0 drwxrwxrwt. 3 root root 21 May 19 14:21 .. drwxr-xr-x. 3 cloud-user cloud-user 24 May 19 14:21 . drwxr-xr-x. 2 cloud-user cloud-user 26 May 19 14:21 containers [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo ls -latr /mnt/.config/containers/ total 4 drwxr-xr-x. 3 cloud-user cloud-user 24 May 19 14:21 .. -rw-r--r--. 1 cloud-user cloud-user 330 May 19 14:21 storage.conf drwxr-xr-x. 2 cloud-user cloud-user 26 May 19 14:21 . [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo cat /mnt/.config/containers/ cat: /mnt/.config/containers/: Is a directory [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo cat /mnt/.config/containers/storage.conf [storage] driver = "overlay" [storage.options] size = "" remap-uids = "" remap-gids = "" ignore_chown_errors = "" remap-user = "" remap-group = "" skip_mount_home = "" mount_program = "/usr/bin/fuse-overlayfs" mountopt = "" additionalimagestores = [ "/usr/lib/containers/storage",] [storage.options.overlay] force_mask = "shared" [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ diff /etc/skel/.config/containers/storage.conf /mnt/.config/containers/storage.conf [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf .config/instructlab/^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun mkdir: cannot create directory ‘iso-testrun’: File exists [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf iso-testrun/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun + ilab config init Existing system profiles were found in: /var/home/cloud-user/.local/share/instructlab/internal/system_profiles Do you want to restore these profiles to the default values? [y/N]: ^CAborted! [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo rm -rf^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ echo $ILAB_HOME [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ exit logout Connection to 150.240.3.148 closed. mdepaulo@mdepaulo-thinkpadx1nanogen2:~/rhelai/ecosystem-rhel-ai$ ssh cloud-user@150.240.3.148 Last login: Mon May 19 14:23:42 2025 from 98.116.66.226 [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ env | grep HOME HOME=/var/home/cloud-user ILAB_HOME=/mnt [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ ls -latr total 20 -rw-r--r--. 1 cloud-user cloud-user 492 May 16 05:38 .bashrc -rw-r--r--. 1 cloud-user cloud-user 18 May 16 05:38 .bash_logout drwxr-xr-x. 3 root root 24 May 19 12:56 .. drwx------. 2 cloud-user cloud-user 29 May 19 12:56 .ssh -rw-r--r--. 1 cloud-user cloud-user 3566 May 19 14:09 EL_AI_test_1.5.sh -rw-r--r--. 1 cloud-user cloud-user 168 May 19 14:22 .bash_profile drwx------. 3 cloud-user cloud-user 19 May 19 14:25 .local drwxr-xr-x. 3 cloud-user cloud-user 25 May 19 14:25 .cache drwxr-xr-x. 2 cloud-user cloud-user 6 May 19 14:27 iso-testrun drwx------. 7 cloud-user cloud-user 182 May 19 14:27 . drwxr-xr-x. 5 cloud-user cloud-user 54 May 19 14:27 .config -rw-------. 1 cloud-user cloud-user 1660 May 19 14:28 .bash_history [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .config/ cni/ containers/ instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .config/ cni/ containers/ instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .config/ cni/ containers/ instructlab/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf .config .local/ .cache/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun mkdir: cannot create directory ‘iso-testrun’: File exists [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rm -rf iso-testrun/^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ rmdir iso-testrun/ [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + ilab --version ilab, version 0.26.1 + sudo cp /run/user/1000/containers/auth.json /etc/ostree/ + mkdir iso-testrun + ilab config init ---------------------------------------------------- Welcome to the InstructLab CLI This guide will help you to setup your environment ---------------------------------------------------- Please provide the following values to initiate the environment [press 'Enter' for default options when prompted] Cloning https://github.com/instructlab/taxonomy.git... Generating config file: /mnt/.config/instructlab/config.yaml Please choose a system profile. Profiles set hardware-specific defaults for all commands and sections of the configuration. First, please select the hardware vendor your system falls into [0] NO SYSTEM PROFILE [1] AMD Enter the number of your choice [0]: 1 You selected: AMD Next, please select the specific hardware configuration that most closely matches your system. [0] NO SYSTEM PROFILE [1] AMD MI300X X4 [2] AMD MI300X X2 [3] AMD MI300X X8 Enter the number of your choice [hit enter for hardware defaults] [0]: 3 You selected: /mnt/.local/share/instructlab/internal/system_profiles/amd/mi300x/mi300x_x8.yaml -------------------------------------------- Initialization completed successfully! You're ready to start using `ilab`. Enjoy! -------------------------------------------- + sed -i 's/gpus: 1/gpus: 8/g' /var/home/cloud-user/.config/instructlab/config.yaml sed: can't read /var/home/cloud-user/.config/instructlab/config.yaml: No such file or directory [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cp EL_AI_test_1.5.sh EL_AI_test_1.5.rem^C [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vim EL_AI_test_1.5.sh -bash: vim: command not found [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ vi EL_AI_test_1.5.sh [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ bash EL_AI_test_1.5.sh + podman login registry.redhat.io Authenticating with existing credentials for registry.redhat.io Existing credentials are valid. Already logged in to registry.redhat.io + sed -i 's/gpus: 1/gpus: 8/g' /mnt/.config/instructlab/config.yaml + ilab config show + ilab system info + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3 --release 1.5 INFO 2025-05-19 14:36:09,612 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3@1.5 Destination: /mnt/.cache/instructlab/models Copying blob 6f4761a5ce47 done | Copying blob cd99f66c98e5 done | Copying blob cfc7749b96f6 done | Copying blob 4452b845ab9c done | Copying blob 488e082ff0d1 done | Copying blob 01f47425d010 done | Copying blob d8d4489231c6 done | Copying blob 5d44fdf2d36d done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:36:17,098 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:36:17,098 instructlab.model.download:302: Available models (`ilab model list`): +------------+---------------+------+---------------+ | Model Name | Last Modified | Size | Absolute path | +------------+---------------+------+---------------+ +------------+---------------+------+---------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3 --release 1.5 INFO 2025-05-19 14:36:23,501 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3@1.5 Destination: /mnt/.cache/instructlab/models Copying blob c4334cbcdf17 done | Copying blob 4d0d6bb4d9d0 done | Copying blob 488e082ff0d1 done | Copying blob e84e60569620 done | Copying blob 82d96d7a9e6c done | Copying blob cfc7749b96f6 done | Copying blob 0f17dc4a3b97 done | Copying blob d2313c03a149 done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:36:28,854 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:36:28,854 instructlab.model.download:302: Available models (`ilab model list`): +------------+---------------+------+---------------+ | Model Name | Last Modified | Size | Absolute path | +------------+---------------+------+---------------+ +------------+---------------+------+---------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2 --release 1.5 INFO 2025-05-19 14:36:35,380 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2@1.5 Destination: /mnt/.cache/instructlab/models Copying blob cfeca4972fa9 done | Copying blob 694fca9fdcbf done | Copying blob 081bdeaf76fc done | Copying blob db47a10e7df0 done | Copying blob 3f4905316ed0 done | Copying blob 303127a244b0 done | Copying blob 6ff9f3935185 done | Copying blob ac7915d4e17b done | Copying blob 05db54e4d322 done | Copying blob ed944c0b3d71 done | Copying blob 66a07f75fd8d done | Copying blob 80ab859339a2 done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:39:06,562 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:39:06,562 instructlab.model.download:302: Available models (`ilab model list`): +------------------------------+---------------------+---------+------------------------------------------------------+ | Model Name | Last Modified | Size | Absolute path | +------------------------------+---------------------+---------+------------------------------------------------------+ | models/granite-3.1-8b-lab-v2 | 2025-05-19 14:39:06 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2 | +------------------------------+---------------------+---------+------------------------------------------------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-starter-v2 --release 1.5 INFO 2025-05-19 14:39:12,995 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-starter-v2@1.5 Destination: /mnt/.cache/instructlab/models Copying blob 8dc79e9b964f done | Copying blob 3b555dd64b66 done | Copying blob 303127a244b0 done | Copying blob 6262ed507463 done | Copying blob 55febb012f75 done | Copying blob 081bdeaf76fc done | Copying blob 2548cc91efbb done | Copying blob 3119969eb015 done | Copying blob ac7915d4e17b done | Copying blob 05db54e4d322 done | Copying blob ed944c0b3d71 done | Copying blob 66a07f75fd8d done | Copying blob 3e1391c11dea done | Copying blob 80ab859339a2 done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:41:46,932 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-starter-v2 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:41:46,932 instructlab.model.download:302: Available models (`ilab model list`): +----------------------------------+---------------------+---------+----------------------------------------------------------+ | Model Name | Last Modified | Size | Absolute path | +----------------------------------+---------------------+---------+----------------------------------------------------------+ | models/granite-3.1-8b-lab-v2 | 2025-05-19 14:39:06 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2 | | models/granite-3.1-8b-starter-v2 | 2025-05-19 14:41:46 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-starter-v2 | +----------------------------------+---------------------+---------+----------------------------------------------------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1 --release 1.5 INFO 2025-05-19 14:41:53,493 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1@1.5 Destination: /mnt/.cache/instructlab/models Copying blob d0b63fca793c done | Copying blob 29e15364d8ab done | Copying blob 47324f06fdb5 done | Copying blob 40e6ecbcedfc done | Copying blob 9d56d04b36d0 done | Copying blob 54669c5aec29 done | Copying blob 67e0596920fe done | Copying blob e330eabd70b4 done | Copying blob 048fa5347877 done | Copying blob 83bfed6169c1 done | Copying blob af316ad78402 done | Copying blob 5882e4366c63 done | Copying blob 77813d1dbee6 done | Copying blob ff24540d9967 done | Copying blob 48bc12845676 done | Copying blob e56a2e7eda69 done | Copying blob da627f6a3c8f done | Copying blob 61e0f22bff93 done | Copying blob 76466bfc2312 done | Copying blob 570af3b802be done | Copying blob 4c603b65cbd5 done | Copying blob 272f33c76bca done | Copying blob a8f30ebfaf56 done | Copying blob 6fa06efa2785 done | Copying blob 11c08db21487 done | Copying blob dadfd56d7667 done | Copying blob 475361439e5c done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 14:51:53,163 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 14:51:53,163 instructlab.model.download:302: Available models (`ilab model list`): +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ | Model Name | Last Modified | Size | Absolute path | +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ | models/granite-3.1-8b-lab-v2 | 2025-05-19 14:39:06 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2 | | models/granite-3.1-8b-starter-v2 | 2025-05-19 14:41:46 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-starter-v2 | | models/mixtral-8x7b-instruct-v0-1 | 2025-05-19 14:51:53 | 87.0 GB | /mnt/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 | +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ + ilab model download --repository docker://registry.stage.redhat.io/rhelai1/prometheus-8x7b-v2-0 --release 1.5 INFO 2025-05-19 14:51:59,851 instructlab.model.download:192: Downloading model from OCI registry: Model: docker://registry.stage.redhat.io/rhelai1/prometheus-8x7b-v2-0@1.5 Destination: /mnt/.cache/instructlab/models Copying blob cc0b434114a0 done | Copying blob 40e6ecbcedfc done | Copying blob 45147a3fae61 done | Copying blob a375e93d6f89 done | Copying blob 17e420ee7a3c done | Copying blob 9d56d04b36d0 done | Copying blob 07529e846183 done | Copying blob 69239081714b done | Copying blob 82ba1df1bcff done | Copying blob 7dfbb89db40a done | Copying blob d6b91c38dcac done | Copying blob 042fa6758c75 done | Copying blob fc2658c9dba2 done | Copying blob 958bf1eb6fc6 done | Copying blob 4cfc38eabca1 done | Copying blob d89723805505 done | Copying blob ad148e16985f done | Copying blob 520bd83ae1b8 done | Copying blob 189922a4c16e done | Copying blob 96b05ad26199 done | Copying blob e6086166348b done | Copying blob af6f32190c41 done | Copying blob 92470b0bd930 done | Copying blob a8f30ebfaf56 done | Copying blob 96bdbb8504d9 done | Copying blob fc4f0bd70b37 done | Copying blob dadfd56d7667 done | Copying blob 7ada2fa1461c done | Copying config 44136fa355 done | Writing manifest to image destination INFO 2025-05-19 15:02:12,748 instructlab.model.download:288: ᕦ(òᴗóˇ)ᕤ docker://registry.stage.redhat.io/rhelai1/prometheus-8x7b-v2-0 model download completed successfully! ᕦ(òᴗóˇ)ᕤ INFO 2025-05-19 15:02:12,748 instructlab.model.download:302: Available models (`ilab model list`): +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ | Model Name | Last Modified | Size | Absolute path | +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ | models/granite-3.1-8b-lab-v2 | 2025-05-19 14:39:06 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2 | | models/granite-3.1-8b-starter-v2 | 2025-05-19 14:41:46 | 15.6 GB | /mnt/.cache/instructlab/models/granite-3.1-8b-starter-v2 | | models/mixtral-8x7b-instruct-v0-1 | 2025-05-19 14:51:53 | 87.0 GB | /mnt/.cache/instructlab/models/mixtral-8x7b-instruct-v0-1 | | models/prometheus-8x7b-v2-0 | 2025-05-19 15:02:12 | 87.0 GB | /mnt/.cache/instructlab/models/prometheus-8x7b-v2-0 | +-----------------------------------+---------------------+---------+-----------------------------------------------------------+ + ilab taxonomy diff compositional_skills/grounded/linguistics/inclusion/qna.yaml compositional_skills/grounded/linguistics/writing/rewriting/qna.yaml compositional_skills/linguistics/synonyms/qna.yaml knowledge/arts/music/fandom/swifties/qna.yaml knowledge/science/animals/birds/black_capped_chickadee/qna.yaml Taxonomy in /mnt/.local/share/instructlab/taxonomy is valid :) + ilab model serve INFO 2025-05-19 15:02:29,233 instructlab.model.serve_backend:80: Setting backend_type in the serve config to vllm INFO 2025-05-19 15:02:29,249 instructlab.model.serve_backend:86: Using model '/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2' with -1 gpu-layers and 4096 max context size. INFO 2025-05-19 15:04:42,074 instructlab.model.serve_backend:133: '--gpus' flag used alongside '--tensor-parallel-size' in the vllm_args section of the config file. Using value of the --gpus flag. INFO 2025-05-19 15:04:42,343 instructlab.model.backends.vllm:332: vLLM starting up on pid 6 at http://127.0.0.1:8000/v1 INFO 05-19 15:05:05 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:05:07 [api_server.py:1034] vLLM API server version 0.8.4 INFO 05-19 15:05:07 [api_server.py:1035] args: Namespace(host='127.0.0.1', port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template='/tmp/tmp23qjiqo4', chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=False, allowed_local_media_path=None, load_format='auto', download_dir=None, model_loader_extra_config=None, use_tqdm_on_load=True, config_format=, dtype='auto', kv_cache_dtype='auto', max_model_len=None, guided_decoding_backend='auto', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend='mp', pipeline_parallel_size=1, tensor_parallel_size=8, data_parallel_size=1, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, disable_custom_all_reduce=False, block_size=None, enable_prefix_caching=None, prefix_caching_hash_algo='builtin', disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.9, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_token=None, hf_overrides=None, enforce_eager=False, max_seq_len_to_capture=8192, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=None, speculative_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2', 'granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-lab-v2', 'models/granite-3.1-8b-starter-v2', 'models/mixtral-8x7b-instruct-v0-1', 'models/prometheus-8x7b-v2-0'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_chunked_mm_input=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False) INFO 05-19 15:07:35 [config.py:689] This model supports multiple tasks: {'generate', 'embed', 'score', 'reward', 'classify'}. Defaulting to 'generate'. INFO 05-19 15:07:35 [arg_utils.py:1742] rocm is experimental on VLLM_USE_V1=1. Falling back to V0 Engine. WARNING 05-19 15:07:35 [arg_utils.py:1603] The model has a long context length (131072). This may causeOOM during the initial memory profiling phase, or result in low performance due to small KV cache size. Consider setting --max-model-len to a smaller value. INFO 05-19 15:09:55 [api_server.py:246] Started engine process with PID 59 INFO 05-19 15:09:59 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:00 [llm_engine.py:243] Initializing a V0 LLM engine (v0.8.4) with config: model='/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2', speculative_config=None, tokenizer='/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=131072, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=8, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=/mnt/.cache/instructlab/models/granite-3.1-8b-lab-v2, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=True, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"max_capture_size":256}, use_cached_outputs=True, WARNING 05-19 15:10:00 [multiproc_worker_utils.py:306] Reducing Torch parallelism from 104 threads to 1 to avoid unnecessary CPU contention. Set OMP_NUM_THREADS in the external environment to tune this value as needed. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. INFO 05-19 15:10:04 [__init__.py:239] Automatically detected platform rocm. (VllmWorkerProcess pid=82) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=85) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=84) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=87) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=83) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=86) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks (VllmWorkerProcess pid=81) INFO 05-19 15:10:06 [multiproc_worker_utils.py:225] Worker ready; awaiting tasks ^CINFO 2025-05-19 15:13:19,404 instructlab.model.backends.vllm:85: vLLM server terminated by keyboard Traceback (most recent call last): File "/usr/lib64/python3.11/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1069, in run_server async with build_async_engine_client(args) as engine_client: File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client async with build_async_engine_client_from_engine_args( File "/usr/lib64/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 264, in build_async_engine_client_from_engine_args await mq_engine_client.setup() File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 284, in setup response = await self._wait_for_server_rpc(socket) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 392, in _wait_for_server_rpc return await self._send_get_data_rpc_request( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/vllm/engine/multiprocessing/client.py", line 320, in _send_get_data_rpc_request if await socket.poll(timeout=VLLM_RPC_TIMEOUT) == 0: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ asyncio.exceptions.CancelledError During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 198, in _run_module_as_main File "", line 88, in _run_code File "/opt/app-root/lib64/python3.11/site-packages/vllm/entrypoints/openai/api_server.py", line 1121, in uvloop.run(run_server(args)) File "/opt/app-root/lib64/python3.11/site-packages/uvloop/__init__.py", line 105, in run return runner.run(wrapper()) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.11/asyncio/runners.py", line 123, in run raise KeyboardInterrupt() KeyboardInterrupt INFO 2025-05-19 15:13:20,776 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... + ilab model chat INFO 2025-05-19 15:14:04,611 instructlab.model.backends.vllm:115: Trying to connect to model server at http://127.0.0.1:8000/v1 INFO 2025-05-19 15:14:06,222 instructlab.model.backends.vllm:332: vLLM starting up on pid 5 at http://127.0.0.1:54991/v1 INFO 2025-05-19 15:14:06,222 instructlab.model.backends.vllm:123: Starting a temporary vLLM server at http://127.0.0.1:54991/v1 INFO 2025-05-19 15:14:06,222 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 1/120 INFO 2025-05-19 15:14:09,517 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 2/120 INFO 2025-05-19 15:14:12,847 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 3/120 INFO 2025-05-19 15:14:16,168 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 4/120 INFO 2025-05-19 15:14:19,399 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 5/120 INFO 2025-05-19 15:14:22,694 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 6/120 INFO 2025-05-19 15:14:26,002 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 7/120 INFO 2025-05-19 15:14:29,375 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 8/120 INFO 2025-05-19 15:14:32,813 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 9/120 INFO 2025-05-19 15:14:36,067 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 10/120 INFO 2025-05-19 15:14:39,372 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 11/120 INFO 2025-05-19 15:14:42,696 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 12/120 INFO 2025-05-19 15:14:46,036 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 13/120 INFO 2025-05-19 15:14:49,246 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 14/120 INFO 2025-05-19 15:14:52,632 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 15/120 INFO 2025-05-19 15:14:55,873 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 16/120 INFO 2025-05-19 15:14:59,188 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 17/120 INFO 2025-05-19 15:15:02,605 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 18/120 INFO 2025-05-19 15:15:05,980 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 19/120 INFO 2025-05-19 15:15:09,163 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 20/120 INFO 2025-05-19 15:15:12,452 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 21/120 INFO 2025-05-19 15:15:15,836 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 22/120 INFO 2025-05-19 15:15:19,285 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 23/120 INFO 2025-05-19 15:15:22,500 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 24/120 INFO 2025-05-19 15:15:25,848 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 25/120 INFO 2025-05-19 15:15:29,290 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 26/120 INFO 2025-05-19 15:15:32,498 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 27/120 INFO 2025-05-19 15:15:35,819 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 28/120 INFO 2025-05-19 15:15:39,119 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 29/120 INFO 2025-05-19 15:15:42,324 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 30/120 INFO 2025-05-19 15:15:45,567 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 31/120 INFO 2025-05-19 15:15:48,867 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 32/120 INFO 2025-05-19 15:15:52,286 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 33/120 INFO 2025-05-19 15:15:55,744 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 34/120 INFO 2025-05-19 15:15:58,983 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 35/120 INFO 2025-05-19 15:16:02,157 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 36/120 INFO 2025-05-19 15:16:05,506 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 37/120 INFO 2025-05-19 15:16:08,804 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 38/120 INFO 2025-05-19 15:16:12,195 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 39/120 INFO 2025-05-19 15:16:15,489 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 40/120 INFO 2025-05-19 15:16:18,719 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 41/120 INFO 2025-05-19 15:16:21,893 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 42/120 INFO 2025-05-19 15:16:25,222 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 43/120 INFO 2025-05-19 15:16:28,586 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 44/120 INFO 2025-05-19 15:16:31,920 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 45/120 INFO 2025-05-19 15:16:35,245 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 46/120 INFO 2025-05-19 15:16:38,490 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 47/120 INFO 2025-05-19 15:16:41,964 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 48/120 INFO 2025-05-19 15:16:45,371 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 49/120 INFO 2025-05-19 15:16:48,699 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 50/120 INFO 2025-05-19 15:16:51,940 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 51/120 INFO 2025-05-19 15:16:55,266 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 52/120 INFO 2025-05-19 15:16:58,518 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 53/120 INFO 2025-05-19 15:17:01,903 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 54/120 INFO 2025-05-19 15:17:05,316 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 55/120 INFO 2025-05-19 15:17:08,625 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 56/120 INFO 2025-05-19 15:17:11,872 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 57/120 INFO 2025-05-19 15:17:15,222 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 58/120 INFO 2025-05-19 15:17:18,525 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 59/120 INFO 2025-05-19 15:17:21,930 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 60/120 INFO 2025-05-19 15:17:25,167 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 61/120 INFO 2025-05-19 15:17:28,493 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 62/120 INFO 2025-05-19 15:17:31,706 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 63/120 INFO 2025-05-19 15:17:35,054 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 64/120 INFO 2025-05-19 15:17:38,317 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 65/120 INFO 2025-05-19 15:17:41,594 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 66/120 INFO 2025-05-19 15:17:45,044 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 67/120 INFO 2025-05-19 15:17:48,344 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 68/120 INFO 2025-05-19 15:17:51,647 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 69/120 INFO 2025-05-19 15:17:54,889 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 70/120 INFO 2025-05-19 15:17:58,178 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 71/120 INFO 2025-05-19 15:18:01,446 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 72/120 INFO 2025-05-19 15:18:04,620 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 73/120 INFO 2025-05-19 15:18:07,853 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 74/120 INFO 2025-05-19 15:18:11,262 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 75/120 INFO 2025-05-19 15:18:14,445 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 76/120 INFO 2025-05-19 15:18:17,710 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 77/120 INFO 2025-05-19 15:18:21,052 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 78/120 INFO 2025-05-19 15:18:24,338 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 79/120 INFO 2025-05-19 15:18:27,632 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 80/120 INFO 2025-05-19 15:18:30,924 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 81/120 INFO 2025-05-19 15:18:34,209 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 82/120 INFO 2025-05-19 15:18:37,453 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 83/120 INFO 2025-05-19 15:18:40,689 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 84/120 INFO 2025-05-19 15:18:44,120 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 85/120 INFO 2025-05-19 15:18:47,559 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 86/120 INFO 2025-05-19 15:18:50,786 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 87/120 INFO 2025-05-19 15:18:54,033 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 88/120 INFO 2025-05-19 15:18:57,329 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 89/120 INFO 2025-05-19 15:19:00,654 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 90/120 INFO 2025-05-19 15:19:03,870 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 91/120 INFO 2025-05-19 15:19:07,315 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 92/120 INFO 2025-05-19 15:19:10,668 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 93/120 INFO 2025-05-19 15:19:14,096 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 94/120 INFO 2025-05-19 15:19:17,243 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 95/120 INFO 2025-05-19 15:19:20,492 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 96/120 INFO 2025-05-19 15:19:23,795 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 97/120 INFO 2025-05-19 15:19:27,006 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 98/120 INFO 2025-05-19 15:19:30,214 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 99/120 INFO 2025-05-19 15:19:33,614 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 100/120 INFO 2025-05-19 15:19:37,021 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 101/120 INFO 2025-05-19 15:19:40,376 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 102/120 INFO 2025-05-19 15:19:43,693 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 103/120 INFO 2025-05-19 15:19:47,065 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 104/120 INFO 2025-05-19 15:19:50,298 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 105/120 INFO 2025-05-19 15:19:53,620 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 106/120 INFO 2025-05-19 15:19:57,074 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 107/120 INFO 2025-05-19 15:20:00,427 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 108/120 INFO 2025-05-19 15:20:03,711 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 109/120 INFO 2025-05-19 15:20:07,163 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 110/120 INFO 2025-05-19 15:20:10,586 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 111/120 INFO 2025-05-19 15:20:13,932 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 112/120 INFO 2025-05-19 15:20:17,300 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 113/120 INFO 2025-05-19 15:20:20,653 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 114/120 INFO 2025-05-19 15:20:23,929 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 115/120 INFO 2025-05-19 15:20:27,198 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 116/120 INFO 2025-05-19 15:20:30,547 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 117/120 INFO 2025-05-19 15:20:33,802 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 118/120 INFO 2025-05-19 15:20:37,087 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 119/120 INFO 2025-05-19 15:20:40,373 instructlab.model.backends.vllm:138: Waiting for the vLLM server to start at http://127.0.0.1:54991/v1, this might take a moment... Attempt: 120/120 INFO 2025-05-19 15:20:41,685 instructlab.model.backends.vllm:148: Gave up waiting for vLLM server to start at http://127.0.0.1:54991/v1 after 120 attempts INFO 2025-05-19 15:20:51,912 instructlab.model.backends.vllm:512: Waiting for GPU VRAM reclamation... Traceback (most recent call last): File "/opt/app-root/bin/ilab", line 8, in sys.exit(ilab()) ^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1161, in __call__ return self.main(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1082, in main rv = self.invoke(ctx) ^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1697, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 1443, in invoke return ctx.invoke(self.callback, **ctx.params) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/core.py", line 788, in invoke return __callback(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/clickext.py", line 356, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/cli/model/chat.py", line 199, in chat chat_model( File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/chat.py", line 688, in chat_model api_base = backend_instance.run_detached(http_client(params)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/vllm.py", line 179, in run_detached raise e File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/vllm.py", line 169, in run_detached vllm_server_process, api_base = self._ensure_server( ^^^^^^^^^^^^^^^^^^^^ File "/opt/app-root/lib64/python3.11/site-packages/instructlab/model/backends/vllm.py", line 156, in _ensure_server raise ServerException(f"vLLM failed to start up in {duration} seconds") instructlab.model.backends.common.ServerException: vLLM failed to start up in 395.5 seconds [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ cat EL_AI_test_1.5.sh set -eux ##### # podman login registry.stage.redhat.io # add credentials podman login registry.redhat.io # add credentials #ilab --version # to get a rhc connect command! #sudo cp /run/user/1000/containers/auth.json /etc/ostree/ #to make bootc switch worky ################ #mkdir iso-testrun #ilab config init #sed -i '/--tensor-parallel-size/,+1d' $HOME/.config/instructlab/config.yaml sed -i 's/gpus: 1/gpus: 8/g' $ILAB_HOME/.config/instructlab/config.yaml ilab config show > iso-testrun/ilab-config-show ilab system info > iso-testrun/ilab-system-info ### Pay attention to what models are to be used for testing the speciffic releases, this is valid for 1.4 !!! ### Also, pay attention to the .stage in the url - if you're doing prod testing, it'd be docker://registry.redhat.io ilab model download --repository docker://registry.stage.redhat.io/rhelai1/skills-adapter-v3 --release 1.5 ilab model download --repository docker://registry.stage.redhat.io/rhelai1/knowledge-adapter-v3 --release 1.5 ilab model download --repository docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-lab-v2 --release 1.5 ilab model download --repository docker://registry.stage.redhat.io/rhelai1/granite-3.1-8b-starter-v2 --release 1.5 ilab model download --repository docker://registry.stage.redhat.io/rhelai1/mixtral-8x7b-instruct-v0-1 --release 1.5 ilab model download --repository docker://registry.stage.redhat.io/rhelai1/prometheus-8x7b-v2-0 --release 1.5 # END OF MODEL DOWNLOADS ilab taxonomy diff ilab model serve # Ctrl + C after gunicorn starts ilab model chat # tmux time ilab data generate | tee iso-testrun/ilab-data-generate # CTRL + B + D # tail -f iso-testrun/ilab-data-generate # to watch progress and not stress about ssh connection drop # ocassionally check output of nvidia-smi -l 3 ### end of data generation shuf -n 15000 .local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_*.jsonl > .local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_reduced.jsonl # tmux a time ilab model train -y --force-clear-phased-cache --enable-serving-output --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/`ls -1 ~/.local/share/instructlab/datasets/ | head -n1`/knowledge_train_msgs_*.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_reduced.jsonl --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 | tee iso-testrun/ilab-train # CTRL + B + D # tail -f iso-testrun/ilab-train # to watch progress and not stress about ssh connection drop # ^^^ This is for a "short" training form which we currently use ## Do the following only if you're absolutely sure you should be doing the "long" testing! #################### LOOOOONG ########################## #tmux a #time ilab model train -y --force-clear-phased-cache --enable-serving-output --strategy lab-multiphase --phased-phase1-data ~/.local/share/instructlab/datasets/`ls -1 ~/.local/share/instructlab/datasets/ | head -n1`/knowledge_train_msgs_*.jsonl --phased-phase2-data ~/.local/share/instructlab/datasets/`ls -1 .local/share/instructlab/datasets/ | head -n1`/skills_train_msgs_20*.jsonl --phased-phase1-num-epochs 2 --phased-phase2-num-epochs 2 | tee iso-testrun/ilab-train ## CTRL + B + D #tail -f iso-testrun/ilab-train # to watch progress and not stress about ssh connection drop #################### LOOOOONG ########################## # And finally verify the chat works ilab model chat [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ nvtop [cloud-user@mdepaulo-v157-amd-prod-2 ~]$ sudo nvtop