Loading...

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: rhelai-1.5
Affects Version/s: rhelai-1.5
Component/s: InstructLab - Evaluation
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Release Blocker:
Approved

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

From: https://gitlab.com/redhat/rhel-ai/diip/-/jobs/10000155067

dk-bench is failing with:

ERROR 2025-05-12 06:16:58,330 instructlab.cli.model.evaluate:313: An error occurred during evaluation: zstd C API versions mismatch; Python bindings were not compiled/linked against expected zstd version (10501 returned by the lib, 10501 hardcoded in zstd headers, 10506 hardcoded in the cext)

running:

curl -s https://raw.githubusercontent.com/instructlab/instructlab/main/scripts/test-data/dk-bench-questions.jsonl > ${HOME_DIR}/dk-bench-questions.jsonl
curl -s https://raw.githubusercontent.com/instructlab/instructlab/main/scripts/test-data/dk-bench-questions-with-responses.jsonl > ${HOME_DIR}/dk-bench-questions-with-responses.jsonl
export ILAB_ADDITIONAL_ENV="OPENAI_API_KEY='$OPENAI_API_KEY'" && ilab model evaluate --model ${trained_model} --benchmark dk_bench --input-questions ${HOME_DIR}/dk-bench-questions.jsonl --output-file-formats ${dk_bench_output_formats} 2>&1 | tee dk_bench.log
export ILAB_ADDITIONAL_ENV="OPENAI_API_KEY='$OPENAI_API_KEY'" && ilab model evaluate --model ${trained_model} --benchmark dk_bench --input-questions ${HOME_DIR}/dk-bench-questions-with-responses.jsonl --output-file-formats ${dk_bench_output_formats} 2>&1 | tee dk_bench_with_responses.log
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 8 CUDA devices:
Device 0: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 1: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 2: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 3: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 4: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 5: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 6: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Device 7: NVIDIA A100-SXM4-80GB, compute capability 8.0, VMM: yes
Platform:
sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
sys.platform: linux
os.name: posix
platform.release: 5.14.0-427.65.1.el9_4.x86_64
platform.machine: x86_64
platform.node: instructlab-ci-8xa100-preserve
platform.python_version: 3.11.7
os-release.ID: rhel
os-release.VERSION_ID: 9.4
os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
memory.total: 1259.87 GB
memory.available: 1250.45 GB
memory.used: 2.45 GB
InstructLab:
instructlab.version: 0.26.1
instructlab-dolomite.version: 0.2.0
instructlab-eval.version: 0.5.1
instructlab-quantize.version: 0.1.0
instructlab-schema.version: 0.4.2
instructlab-sdg.version: 0.8.2
instructlab-training.version: 0.10.2
Torch:
torch.version: 2.6.0
torch.backends.cpu.capability: AVX512
torch.version.cuda: 12.4
torch.version.hip: None
torch.cuda.available: True
torch.backends.cuda.is_built: True
torch.backends.mps.is_built: False
torch.backends.mps.is_available: False
torch.cuda.bf16: True
torch.cuda.current.device: 0
torch.cuda.0.name: NVIDIA A100-SXM4-80GB
torch.cuda.0.free: 78.7 GB
torch.cuda.0.total: 79.1 GB
torch.cuda.0.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.1.name: NVIDIA A100-SXM4-80GB
torch.cuda.1.free: 78.7 GB
torch.cuda.1.total: 79.1 GB
torch.cuda.1.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.2.name: NVIDIA A100-SXM4-80GB
torch.cuda.2.free: 78.7 GB
torch.cuda.2.total: 79.1 GB
torch.cuda.2.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.3.name: NVIDIA A100-SXM4-80GB
torch.cuda.3.free: 78.7 GB
torch.cuda.3.total: 79.1 GB
torch.cuda.3.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.4.name: NVIDIA A100-SXM4-80GB
torch.cuda.4.free: 78.7 GB
torch.cuda.4.total: 79.1 GB
torch.cuda.4.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.5.name: NVIDIA A100-SXM4-80GB
torch.cuda.5.free: 78.7 GB
torch.cuda.5.total: 79.1 GB
torch.cuda.5.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.6.name: NVIDIA A100-SXM4-80GB
torch.cuda.6.free: 78.7 GB
torch.cuda.6.total: 79.1 GB
torch.cuda.6.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
torch.cuda.7.name: NVIDIA A100-SXM4-80GB
torch.cuda.7.free: 78.7 GB
torch.cuda.7.total: 79.1 GB
torch.cuda.7.capability: 8.0 (see https://developer.nvidia.com/cuda-gpus#compute)
llama_cpp_python:
llama_cpp_python.version: 0.3.6
llama_cpp_python.supports_gpu_offload: True

relates to

AIPCC-1352 zstandard package does not use system zstd