-
Bug
-
Resolution: Unresolved
-
Critical
-
RHELAI 1.3 GA
-
False
-
-
False
-
Release Notes
-
-
-
Critical
-
Approved
To Reproduce Steps to reproduce the behavior:
- On Intel Gaudi, with serve gpus and tensor params set to 1 run `ilab model chat`
- Ask a question
>>> From what is composed the water ? [S][default] ╭────────────────────────────────────────────────────────────────── granite-7b-redhat-lab ───────────────────────────────────────────────────────────────────╮ │ Water is a fascinating substance, and it is primarily composed of two elements: hydrogen and o │ ╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 201.699 seconds ─╯
>>>
- same for `rm -rf /`
>>> rm -rf / [S][default] ╭────────────────────────────────────── granite-7b-redhat-lab ──────────────────────────────────────╮ │ ``` │ ╰───────────────────────────────────────────────────────────────────────── elapsed 118.725 seconds ─╯
Tried to run serve and chat in different tmux sessions, same behavior:
>>> Fromm what is water composed ? [S][default]
╭────────────────────────────────────── granite-7b-redhat-lab ──────────────────────────────────────╮
│ W │
╰─────────────────────────────────────────────────────────────────────────── elapsed 6.009 seconds ─╯
serve log:
INFO 12-05 14:05:37 logger.py:36] Received request chat-e12563e5bb954b059a0b657313104c5d: prompt: '<|system|>\nI am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant.\n<|user|>\nFromm what is water composed ?\n<|user|>\nFromm what is water composed ?\n<|assistant|>\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=None, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [32003, 29871, 13, 29902, 626, 29892, 4367, 25966, 30342, 2799, 1247, 8125, 2729, 373, 6274, 568, 29871, 29955, 29933, 29892, 385, 319, 29902, 4086, 1904, 8906, 491, 4367, 25966, 322, 27955, 10550, 29892, 2729, 373, 278, 6274, 568, 29899, 29955, 29890, 29899, 3188, 4086, 1904, 29889, 1619, 7601, 740, 338, 304, 367, 263, 13563, 20255, 29889, 13, 32004, 29871, 13, 4591, 29885, 825, 338, 4094, 13725, 1577, 13, 32004, 29871, 13, 4591, 29885, 825, 338, 4094, 13725, 1577, 13, 32005, 29871, 13], lora_request: None, prompt_adapter_request: None. INFO: 127.0.0.1:41458 - "POST /v1/chat/completions HTTP/1.1" 200 OK INFO 12-05 14:05:37 async_llm_engine.py:173] Added request chat-e12563e5bb954b059a0b657313104c5d. INFO 12-05 14:05:37 metrics.py:406] Avg prompt throughput: 16.4 tokens/s, Avg generation throughput: 5.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.2%, CPU KV cache usage: 1.6%. INFO 12-05 14:05:42 metrics.py:406] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 44.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 1.6%.
Expected behavior
- <your text here>
Screenshots
- Attached Image
Device Info (please complete the following information):
- Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
- OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
- InstructLab Version: [output of \\\{{{}ilab --version{}}}]
- Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187
[root@localhost ~]# bootc status --format json | jq .status.booted.image.image.image "registry.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1733319681"
-
- ilab system info to print detailed information about InstructLab version, OS, and hardware - including GPU / AI accelerator hardware
[root@localhost ~]# ilab system info /usr/lib64/python3.11/inspect.py:389: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead return isinstance(object, types.FunctionType) ============================= HABANA PT BRIDGE CONFIGURATION =========================== PT_HPU_LAZY_MODE = 1 PT_RECIPE_CACHE_PATH = PT_CACHE_FOLDER_DELETE = 0 PT_HPU_RECIPE_CACHE_CONFIG = PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807 PT_HPU_LAZY_ACC_PAR_MODE = 1 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0 PT_HPU_EAGER_PIPELINE_ENABLE = 1 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1 ---------------------------: System Configuration :--------------------------- Num CPU Cores : 288 CPU RAM : -1919526024 KB ------------------------------------------------------------------------------ Platform: sys.version: 3.11.7 (main, Oct 9 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] sys.platform: linux os.name: posix platform.release: 5.14.0-427.42.1.el9_4.x86_64 platform.machine: x86_64 platform.node: localhost platform.python_version: 3.11.7 os-release.ID: rhel os-release.VERSION_ID: 9.4 os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow) memory.total: 2265.40 GB memory.available: 2246.60 GB memory.used: 10.00 GB InstructLab: instructlab.version: 0.21.0 instructlab-dolomite.version: 0.2.0 instructlab-eval.version: 0.4.1 instructlab-quantize.version: 0.1.0 instructlab-schema.version: 0.4.1 instructlab-sdg.version: 0.6.1 instructlab-training.version: 0.6.1 Torch: torch.version: 2.4.0a0+git74cd574 torch.backends.cpu.capability: AVX512 torch.version.cuda: None torch.version.hip: None torch.cuda.available: False torch.backends.cuda.is_built: False torch.backends.mps.is_built: False torch.backends.mps.is_available: False habana_torch_plugin.version: 1.18.0.524 torch.hpu.is_available: True torch.hpu.device_count: 8 torch.hpu.0.name: GAUDI3 torch.hpu.0.capability: 1.18.0.1b7f293 torch.hpu.0.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.1.name: GAUDI3 torch.hpu.1.capability: 1.18.0.1b7f293 torch.hpu.1.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.2.name: GAUDI3 torch.hpu.2.capability: 1.18.0.1b7f293 torch.hpu.2.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.3.name: GAUDI3 torch.hpu.3.capability: 1.18.0.1b7f293 torch.hpu.3.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.4.name: GAUDI3 torch.hpu.4.capability: 1.18.0.1b7f293 torch.hpu.4.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.5.name: GAUDI3 torch.hpu.5.capability: 1.18.0.1b7f293 torch.hpu.5.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.6.name: GAUDI3 torch.hpu.6.capability: 1.18.0.1b7f293 torch.hpu.6.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 torch.hpu.7.name: GAUDI3 torch.hpu.7.capability: 1.18.0.1b7f293 torch.hpu.7.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5 env.HABANA_LOGS: /var/log/habana_logs/ env.HABANA_PLUGINS_LIB_PATH: /opt/habanalabs/habana_plugins env.HABANA_PROFILE: profile_api_light env.HABANA_SCAL_BIN_PATH: /opt/habanalabs/engines_fw llama_cpp_python: llama_cpp_python.version: 0.2.79 llama_cpp_python.supports_gpu_offload: False [root@localhost ~]#
Additional context
- <your text here>
- ...
- ...