Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: rhelai-1.5
Affects Version/s: rhelai-1.5
Component/s: InstructLab - Core
Labels:
- Gaudi
- Intel

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Git Pull Request:
https://github.com/instructlab/instructlab/pull/3358/files
Intelligence Requested:
Market:

Release Blocker:
Approved

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

Run : ilab model serve --model-path ~/.cache/instructlab/models/granite-8b-lab-v1

[root@g3-srv15-c03b-idc ~]# ilab model serve --model-path ~/.cache/instructlab/models/granite-8b-lab-v1
INFO 2025-05-02 16:36:38,799 instructlab.model.serve_backend:79: Setting backend_type in the serve config to vllm
INFO 2025-05-02 16:36:38,816 instructlab.model.serve_backend:85: Using model '/root/.cache/instructlab/models/granite-8b-lab-v1' with -1 gpu-layers and 4096 max context size.
ERROR 2025-05-02 16:36:38,817 instructlab.model.serve_backend:120: Specified --gpus value (8) exceeds available GPUs (0).
Please specify a valid number of GPUs.
[root@g3-srv15-c03b-idc ~]#

Device Info (please complete the following information):

Hardware Specs: Intel Gaudi 3 Server (Intel SDP Platform) with 8 Accelerator (see hl-smi output attached.)
OS Version: Red Hat Enterprise Linux 9.4 / RHEl AI 1.5
InstructLab Version: ilab, version 0.26.0a1
Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image

[root@g3-srv15-c03b-idc ~]# sudo bootc status --format json | jq .status.booted.image.image.image
"registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.5-1746033450"
[root@g3-srv15-c03b-idc ~]#

- ilab system info to print detailed information about InstructLab version, OS, and hardware – including GPU / AI accelerator hardware

-----------------------{}{~~}: System Configuration :{~~}{}------------------------
Num CPU Cores : 224
CPU RAM : 1056269984 KB
------------------------------------------------------------------------------
Platform:
sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
sys.platform: linux
os.name: posix
platform.release: 5.14.0-427.62.1.el9_4.x86_64
platform.machine: x86_64
platform.node: g3-srv15-c03b-idc
platform.python_version: 3.11.7
os-release.ID: rhel
os-release.VERSION_ID: 9.4
os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
memory.total: 1007.34 GB
memory.available: 993.65 GB
memory.used: 8.85 GB
See attached file for compete output

Bug impact

Not able to serve a model

Known workaround

None

Additional context

The system was update from RHEL AI 1.4 to RHEl AI 1.5

bootc switch registry.stage.redhat.io/rhelai1/bootc-intel-rhel9:1.5-1746033

ilab config init was re-run

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

g3-srv15-c03b-idc-hl-smi-output.txt
3 kB
2025/05/02 4:55 PM
g3-srv15-c03b-idc-ilab-system-info
4 kB
2025/05/02 4:52 PM

mentioned on

Merge request - RHELAI-4052: Update dependency instructlab to v0.26.1 (1.5)

Merge request - RHELAI-4052: Update dependency instructlab to v0.26.1 (main)

Assignee:: Charles Doern

Reporter:: Bertrand Rault

Contributors:: Charles Doern

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/05/02 4:51 PM

Updated:: 2025/05/11 8:29 PM

Resolved:: 2025/05/11 8:29 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates