Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: rhelai-1.3.2
Affects Version/s: RHELAI 1.3 GA
Component/s: InstructLab - Evaluation
Labels:
- closed-upstream

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Documentation Type:

Release Notes
Release Note Type:
Known Issue
Git Pull Request:
https://github.com/instructlab/eval/pull/197, https://github.com/instructlab/instructlab/pull/2778

Severity:
Critical

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Run MMLU eval on the granite-8b-starter model

ilab model evaluate --model /var/mnt/instg1/instructlab/models/granite-8b-starter/ --benchmark mmlu --gpus 8 --enable-serving-output

It will fail with a 500 Internal Server error: issue documented in community issue: https://github.com/instructlab/eval/issues/195

Expected behavior

Expect MMLU eval to successfully run and output results

Screenshots

Attached Image

Device Info (please complete the following information):

Hardware Specs: 8xA100 IBM Cloud VSI
OS Version: RHEL AI 1.3
InstructLab Version: 0.21.0
Provide the output of these two commands:
- "registry.redhat.io/rhelai1/bootc-ibm-nvidia-rhel9:1.3"

```

- [root@tyler-machine-boot-6 ~]# ilab system info

Platform:

sys.version: 3.11.7 (main, Oct 9 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]

sys.platform: linux

os.name: posix

platform.release: 5.14.0-427.42.1.el9_4.x86_64

platform.machine: x86_64

platform.node: tyler-machine-boot-6

platform.python_version: 3.11.7

os-release.ID: rhel

os-release.VERSION_ID: 9.4

os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)

memory.total: 1259.87 GB

memory.available: 1246.53 GB

memory.used: 5.41 GB

InstructLab:

instructlab.version: 0.21.0

instructlab-dolomite.version: 0.2.0

instructlab-eval.version: 0.4.1

instructlab-quantize.version: 0.1.0

instructlab-schema.version: 0.4.1

instructlab-sdg.version: 0.6.1

instructlab-training.version: 0.6.1

Torch:

torch.version: 2.4.1

torch.backends.cpu.capability: AVX512

torch.version.cuda: 12.4

torch.version.hip: None

torch.cuda.available: True

torch.backends.cuda.is_built: True

torch.backends.mps.is_built: False

torch.backends.mps.is_available: False

torch.cuda.bf16: True

torch.cuda.current.device: 0

torch.cuda.0.name: NVIDIA A100-SXM4-80GB

torch.cuda.0.free: 78.7 GB

torch.cuda.0.total: 79.1 GB

torch.cuda.0.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.1.name: NVIDIA A100-SXM4-80GB

torch.cuda.1.free: 78.7 GB

torch.cuda.1.total: 79.1 GB

torch.cuda.1.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.2.name: NVIDIA A100-SXM4-80GB

torch.cuda.2.free: 78.7 GB

torch.cuda.2.total: 79.1 GB

torch.cuda.2.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.3.name: NVIDIA A100-SXM4-80GB

torch.cuda.3.free: 78.7 GB

torch.cuda.3.total: 79.1 GB

torch.cuda.3.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.4.name: NVIDIA A100-SXM4-80GB

torch.cuda.4.free: 78.7 GB

torch.cuda.4.total: 79.1 GB

torch.cuda.4.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.5.name: NVIDIA A100-SXM4-80GB

torch.cuda.5.free: 78.7 GB

torch.cuda.5.total: 79.1 GB

torch.cuda.5.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.6.name: NVIDIA A100-SXM4-80GB

torch.cuda.6.free: 78.7 GB

torch.cuda.6.total: 79.1 GB

torch.cuda.6.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

torch.cuda.7.name: NVIDIA A100-SXM4-80GB

torch.cuda.7.free: 78.7 GB

torch.cuda.7.total: 79.1 GB

torch.cuda.7.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)

llama_cpp_python:

llama_cpp_python.version: 0.2.79

llama_cpp_python.supports_gpu_offload: True
```

Additional context

<your text here>
…
…

Assignee:: Oleg Silkin

Reporter:: Tyler Lisowski

Contributors:: Mustafa Eyceoz

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/12/09 7:34 PM

Updated:: 2024/12/17 7:25 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates