-
Bug
-
Resolution: Done
-
Undefined
-
rhelai-1.4
-
None
-
False
-
-
False
-
-
-
-
Critical
-
Proposed
To Reproduce Steps to reproduce the behavior:
- log into rhel ai instance as a non-root user containing this change
- run `ilab`
$ ilab system info Trying to pull registry.stage.redhat.io/rhelai1/instructlab-amd-rhel9:1.4-1738264488... Error: initializing source docker://registry.stage.redhat.io/rhelai1/instructlab-amd-rhel9:1.4-1738264488: unable to retrieve auth token: invalid username/password: unauthorized: Please login to the Red Hat Registry using your Customer Portal credentials. Further instructions can be found here: https://access.redhat.com/RegistryAuthentication
Expected behavior
- ilab command should work as non-root user
Device Info (please complete the following information):
- Hardware Specs: x86_64, MI300X bare metal
- OS Version: RHEL AI 1.4
- InstructLab Version: 0.23.1
- Provide the output of these two commands:
- sudo bootc status --format json | jq .status.booted.image.image.image
- "registry.stage.redhat.io/rhelai1/bootc-amd-rhel9:1.4-1738329869"
- ilab system info
- sudo bootc status --format json | jq .status.booted.image.image.image
# ilab system info Platform: sys.version: 3.11.7 (main, Jan 8 2025, 00:00:00) [GCC 11.4.1 20231218 (Red Hat 11.4.1-3)] sys.platform: linux os.name: posix platform.release: 5.14.0-427.50.1.el9_4.x86_64 platform.machine: x86_64 platform.node: GPUF333 platform.python_version: 3.11.7 os-release.ID: rhel os-release.VERSION_ID: 9.4 os-release.PRETTY_NAME: Red Hat Enterprise Linux 9.4 (Plow) memory.total: 3023.54 GB memory.available: 2983.71 GB memory.used: 30.85 GB InstructLab: instructlab.version: 0.23.1 instructlab-dolomite.version: 0.2.0 instructlab-eval.version: 0.5.1 instructlab-quantize.version: 0.1.0 instructlab-schema.version: 0.4.2 instructlab-sdg.version: 0.7.0 instructlab-training.version: 0.7.0 Torch: torch.version: 2.4.1 torch.backends.cpu.capability: AVX512 torch.version.cuda: None torch.version.hip: 6.2.41134-65d174c3e torch.cuda.available: True torch.backends.cuda.is_built: True torch.backends.mps.is_built: False torch.backends.mps.is_available: False torch.cuda.bf16: True torch.cuda.current.device: 0 torch.cuda.0.name: AMD Radeon Graphics torch.cuda.0.free: 191.4 GB torch.cuda.0.total: 192.0 GB torch.cuda.0.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.1.name: AMD Radeon Graphics torch.cuda.1.free: 191.4 GB torch.cuda.1.total: 192.0 GB torch.cuda.1.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.2.name: AMD Radeon Graphics torch.cuda.2.free: 191.4 GB torch.cuda.2.total: 192.0 GB torch.cuda.2.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.3.name: AMD Radeon Graphics torch.cuda.3.free: 191.4 GB torch.cuda.3.total: 192.0 GB torch.cuda.3.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.4.name: AMD Radeon Graphics torch.cuda.4.free: 191.4 GB torch.cuda.4.total: 192.0 GB torch.cuda.4.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.5.name: AMD Radeon Graphics torch.cuda.5.free: 191.4 GB torch.cuda.5.total: 192.0 GB torch.cuda.5.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.6.name: AMD Radeon Graphics torch.cuda.6.free: 191.4 GB torch.cuda.6.total: 192.0 GB torch.cuda.6.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) torch.cuda.7.name: AMD Radeon Graphics torch.cuda.7.free: 191.4 GB torch.cuda.7.total: 192.0 GB torch.cuda.7.capability: 9.4 (see https://developer.nvidia.com/cuda-gpus#compute) llama_cpp_python: llama_cpp_python.version: 0.3.2 llama_cpp_python.supports_gpu_offload: False
Bug impact
- users will not be able to use ilab on affected RHEL AI images
Known workaround
- run all ilab commands as root can work but will almost certainly hit disk space issues due to default partitioning scheme
- duplicates
-
RHELAI-3217 Document update step to access shared instructlab image
-
- Closed
-