Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhelai-1.4
Component/s: Accelerators - AMD, Containers
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Platform Tools Miscellaneous 1.4
Git Pull Request:
https://gitlab.com/redhat/rhel-ai/containers/amd-bootc/-/merge_requests/214
Intelligence Requested:
Market:

Severity:
Moderate

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To Reproduce Steps to reproduce the behavior:

boot any 1.3 amd bootc image
cat /etc/os-release
notice that the following is at the bottom of output

RHEL_AI_VERSION_ID=''

Expected behavior

RHEL_AI_VERSION_ID="<rhel ai version being used>"

Device Info (please complete the following information):

Hardware Specs: any AMD Accelerator
OS Version: any RHEL AI AMD build
InstructLab Version: ilab, version 0.21.2

Provide the output of these two commands:

sudo bootc status --format json | jq .status.booted.image.image.image
- registry.redhat.io/rhelai1/bootc-amd-rhel9:1.3

ilab system info :

$ ilab system info
Platform:
  sys.version: 3.11.7 (main, Oct  9 2024, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)]
  sys.platform: linux
  os.name: posix
  platform.release: 5.14.0-427.42.1.el9_4.x86_64
  platform.machine: x86_64
  platform.node: GPUFC5Q
  platform.python_version: 3.11.7
  os-release.ID: rhel
  os-release.VERSION_ID: 9.4
  os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow)
  memory.total: 3023.54 GB
  memory.available: 2979.15 GB
  memory.used: 31.47 GB
InstructLab:
  instructlab.version: 0.21.2
  instructlab-dolomite.version: 0.2.0
  instructlab-eval.version: 0.4.1
  instructlab-quantize.version: 0.1.0
  instructlab-schema.version: 0.4.1
  instructlab-sdg.version: 0.6.1
  instructlab-training.version: 0.6.1
Torch:
  torch.version: 2.4.1
  torch.backends.cpu.capability: AVX512
  torch.version.cuda: None
  torch.version.hip: 6.2.41134-65d174c3e
  torch.cuda.available: True
  torch.backends.cuda.is_built: True
  torch.backends.mps.is_built: False
  torch.backends.mps.is_available: False
  torch.cuda.bf16: True
  torch.cuda.current.device: 0
  torch.cuda.0.name: AMD Radeon Graphics
  torch.cuda.0.free: 191.4 GB
  torch.cuda.0.total: 192.0 GB
  torch.cuda.0.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.1.name: AMD Radeon Graphics
  torch.cuda.1.free: 191.4 GB
  torch.cuda.1.total: 192.0 GB
  torch.cuda.1.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.2.name: AMD Radeon Graphics
  torch.cuda.2.free: 191.4 GB
  torch.cuda.2.total: 192.0 GB
  torch.cuda.2.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.3.name: AMD Radeon Graphics
  torch.cuda.3.free: 191.4 GB
  torch.cuda.3.total: 192.0 GB
  torch.cuda.3.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.4.name: AMD Radeon Graphics
  torch.cuda.4.free: 191.4 GB
  torch.cuda.4.total: 192.0 GB
  torch.cuda.4.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.5.name: AMD Radeon Graphics
  torch.cuda.5.free: 191.4 GB
  torch.cuda.5.total: 192.0 GB
  torch.cuda.5.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.6.name: AMD Radeon Graphics
  torch.cuda.6.free: 191.4 GB
  torch.cuda.6.total: 192.0 GB
  torch.cuda.6.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
  torch.cuda.7.name: AMD Radeon Graphics
  torch.cuda.7.free: 191.4 GB
  torch.cuda.7.total: 192.0 GB
  torch.cuda.7.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute)
llama_cpp_python:
  llama_cpp_python.version: 0.2.79
  llama_cpp_python.supports_gpu_offload: True

Bug impact

Please provide information on the impact of this bug to the end user.

Known workaround

it's generally possible to infer the version from the instructlab container tag or sometimes possible with bootc info if the system isn't following floating tag but this can be difficult to parse with code.

Additional context

certification tooling depends on this information

clones

RHELAI-2946 RHEL_AI_VERSION_ID is empty for amd bootc images

Closed

mentioned on

Commit - Merge branch 'RHELAI-3006' into 'main'

Merge request - Containerfile: Fix RHEL_AI_VERSION_ID

Assignee:: Prarit Bhargava

Reporter:: Tim Flink

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/01/15 11:43 PM

Updated:: 2025/01/16 2:39 PM

Resolved:: 2025/01/16 12:49 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide