Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: rhelai-1.4
Affects Version/s: rhelai-1.3
Component/s: Accelerators - Intel Gaudi, InstructLab - Core
Labels:
- Gaudi
- Intel

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Documentation Type:

Release Notes
Git Pull Request:
https://gitlab.com/redhat/rhel-ai/containers/instructlab-intel/-/merge_requests/78
Intelligence Requested:
Market:

Severity:
Critical

Release Blocker:
Approved

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

On Intel Gaudi, with serve gpus and tensor params set to 1 run `ilab model chat`
Ask a question

>>> From what is composed the water ?                                                                                                             [S][default]
╭────────────────────────────────────────────────────────────────── granite-7b-redhat-lab ───────────────────────────────────────────────────────────────────╮
│ Water is a fascinating substance, and it is primarily composed of two elements: hydrogen and o                                                             │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── elapsed 201.699 seconds ─╯

>>>

same for `rm -rf /`

>>> rm -rf /                                                                             [S][default]
╭────────────────────────────────────── granite-7b-redhat-lab ──────────────────────────────────────╮
│ ```                                                                                               │
╰───────────────────────────────────────────────────────────────────────── elapsed 118.725 seconds ─╯

Tried to run serve and chat in different tmux sessions, same behavior:

>>> Fromm what is water composed ?                                                       [S][default]
╭────────────────────────────────────── granite-7b-redhat-lab ──────────────────────────────────────╮
│ W                                                                                                 │
╰─────────────────────────────────────────────────────────────────────────── elapsed 6.009 seconds ─╯

serve log:

INFO 12-05 14:05:37 logger.py:36] Received request chat-e12563e5bb954b059a0b657313104c5d: prompt: '<|system|>\nI am, Red Hat® Instruct Model based on Granite 7B, an AI language model developed by Red Hat and IBM Research, based on the Granite-7b-base language model. My primary function is to be a chat assistant.\n<|user|>\nFromm what is water composed ?\n<|user|>\nFromm what is water composed ?\n<|assistant|>\n', params: SamplingParams(n=1, best_of=1, presence_penalty=0.0, frequency_penalty=0.0, repetition_penalty=1.0, temperature=1.0, top_p=1.0, top_k=-1, min_p=0.0, seed=None, use_beam_search=False, length_penalty=1.0, early_stopping=False, stop=[], stop_token_ids=[], include_stop_str_in_output=False, ignore_eos=False, max_tokens=None, min_tokens=0, logprobs=None, prompt_logprobs=None, skip_special_tokens=True, spaces_between_special_tokens=True, truncate_prompt_tokens=None), prompt_token_ids: [32003, 29871, 13, 29902, 626, 29892, 4367, 25966, 30342, 2799, 1247, 8125, 2729, 373, 6274, 568, 29871, 29955, 29933, 29892, 385, 319, 29902, 4086, 1904, 8906, 491, 4367, 25966, 322, 27955, 10550, 29892, 2729, 373, 278, 6274, 568, 29899, 29955, 29890, 29899, 3188, 4086, 1904, 29889, 1619, 7601, 740, 338, 304, 367, 263, 13563, 20255, 29889, 13, 32004, 29871, 13, 4591, 29885, 825, 338, 4094, 13725, 1577, 13, 32004, 29871, 13, 4591, 29885, 825, 338, 4094, 13725, 1577, 13, 32005, 29871, 13], lora_request: None, prompt_adapter_request: None.
INFO:     127.0.0.1:41458 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO 12-05 14:05:37 async_llm_engine.py:173] Added request chat-e12563e5bb954b059a0b657313104c5d.
INFO 12-05 14:05:37 metrics.py:406] Avg prompt throughput: 16.4 tokens/s, Avg generation throughput: 5.6 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.2%, CPU KV cache usage: 1.6%.
INFO 12-05 14:05:42 metrics.py:406] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 44.4 tokens/s, Running: 1 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.4%, CPU KV cache usage: 1.6%.

Expected behavior

<your text here>

Screenshots

Attached Image

Device Info (please complete the following information):

Hardware Specs: [e.g. Apple M2 Pro Chip, 16 GB Memory, etc.]
OS Version: [e.g. Mac OS 14.4.1, Fedora Linux 40]
InstructLab Version: [output of \\\{{{}ilab --version{}}}]
Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image to print the name and tag of the bootc image, should look like registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1732894187

[root@localhost ~]# bootc status --format json | jq .status.booted.image.image.image
"registry.redhat.io/rhelai1/bootc-intel-rhel9:1.3-1733319681"

ilab system info to print detailed information about InstructLab version, OS, and hardware - including GPU / AI accelerator hardware

[root@localhost ~]# ilab system info
/usr/lib64/python3.11/inspect.py:389: FutureWarning: `torch.distributed.reduce_op` is deprecated, please use `torch.distributed.ReduceOp` instead
  return isinstance(object, types.FunctionType)
============================= HABANA PT BRIDGE CONFIGURATION =========================== 
 PT_HPU_LAZY_MODE = 1
 PT_RECIPE_CACHE_PATH = 
 PT_CACHE_FOLDER_DELETE = 0
 PT_HPU_RECIPE_CACHE_CONFIG = 
 PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
 PT_HPU_LAZY_ACC_PAR_MODE = 1
 PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
 PT_HPU_EAGER_PIPELINE_ENABLE = 1
 PT_HPU_EAGER_COLLECTIVE_PIPELINE_ENABLE = 1
---------------------------: System Configuration :---------------------------
Num CPU Cores : 288
CPU RAM       : -1919526024 KB
------------------------------------------------------------------------------
Platform:
  sys.version: 3.11.7 (main, Oct  9 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
  sys.platform: linux
  os.name: posix
  platform.release: 5.14.0-427.42.1.el9_4.x86_64
  platform.machine: x86_64
  platform.node: localhost
  platform.python_version: 3.11.7
  os-release.ID: rhel
  os-release.VERSION_ID: 9.4
  os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
  memory.total: 2265.40 GB
  memory.available: 2246.60 GB
  memory.used: 10.00 GB
InstructLab:
  instructlab.version: 0.21.0
  instructlab-dolomite.version: 0.2.0
  instructlab-eval.version: 0.4.1
  instructlab-quantize.version: 0.1.0
  instructlab-schema.version: 0.4.1
  instructlab-sdg.version: 0.6.1
  instructlab-training.version: 0.6.1
Torch:
  torch.version: 2.4.0a0+git74cd574
  torch.backends.cpu.capability: AVX512
  torch.version.cuda: None
  torch.version.hip: None
  torch.cuda.available: False
  torch.backends.cuda.is_built: False
  torch.backends.mps.is_built: False
  torch.backends.mps.is_available: False
  habana_torch_plugin.version: 1.18.0.524
  torch.hpu.is_available: True
  torch.hpu.device_count: 8
  torch.hpu.0.name: GAUDI3
  torch.hpu.0.capability: 1.18.0.1b7f293
  torch.hpu.0.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.1.name: GAUDI3
  torch.hpu.1.capability: 1.18.0.1b7f293
  torch.hpu.1.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.2.name: GAUDI3
  torch.hpu.2.capability: 1.18.0.1b7f293
  torch.hpu.2.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.3.name: GAUDI3
  torch.hpu.3.capability: 1.18.0.1b7f293
  torch.hpu.3.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.4.name: GAUDI3
  torch.hpu.4.capability: 1.18.0.1b7f293
  torch.hpu.4.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.5.name: GAUDI3
  torch.hpu.5.capability: 1.18.0.1b7f293
  torch.hpu.5.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.6.name: GAUDI3
  torch.hpu.6.capability: 1.18.0.1b7f293
  torch.hpu.6.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  torch.hpu.7.name: GAUDI3
  torch.hpu.7.capability: 1.18.0.1b7f293
  torch.hpu.7.properties: sramBaseAddress=144396662951903232, dramBaseAddress=144396800491520000, sramSize=0, dramSize=136465870848, tpcEnabledMask=18446744073709551615, dramEnabled=1, fd=18, device_id=0, device_type=5
  env.HABANA_LOGS: /var/log/habana_logs/
  env.HABANA_PLUGINS_LIB_PATH: /opt/habanalabs/habana_plugins
  env.HABANA_PROFILE: profile_api_light
  env.HABANA_SCAL_BIN_PATH: /opt/habanalabs/engines_fw
llama_cpp_python:
  llama_cpp_python.version: 0.2.79
  llama_cpp_python.supports_gpu_offload: False
[root@localhost ~]#

Additional context

<your text here>
...
...

duplicates

RHELAI-2964 RHELAI 1.3 Intel: Chat response with llama2/granite stops in the middle of the phrase.

Closed

is cloned by

RHELAI-2964 RHELAI 1.3 Intel: Chat response with llama2/granite stops in the middle of the phrase.

Closed

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates