-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
RHAIIS-3.3
-
None
-
False
-
-
False
-
-
-
Critical
Steps to Reproduce
*Refer https://issues.redhat.com/browse/AIPCC-9449*
1. Create a container on an x86 system with CUDA hardware using the image above:
This is the same image tag as previous but this is the latest.
Please delete the old image and pull again
Container image : registry.gitlab.com/redhat/rhel-ai/core/base-images/app/aipcc-cuda13.0-el9.6-app-x86_64:ci_461
2. Install the wheel:
Torch Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/51406256 Numba Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/52183943
3. Run the following Python command:
python -c "from numba import cuda; cuda.cudadrv.libs.test()"
a. Observe the output for driver detection:
Finding driver from candidates:
libcuda.so
libcuda.so.1
/usr/lib/libcuda.so
/usr/lib/libcuda.so.1
/usr/lib64/libcuda.so
/usr/lib64/libcuda.so.1
Using loader <class 'ctypes.CDLL'>
Trying to load driver... ok
Loaded from libcuda.so
Mapped libcuda.so paths:
/usr/local/cuda-13.0/compat/libcuda.so.580.95.05
Finding nvvm from CUDA_HOME
Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
Trying to open library... ok
Finding nvrtc from CUDA_HOME
Located at /usr/local/cuda/lib64/libnvrtc.so.13.0.88
Trying to open library... ok
Finding cudart from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudart.so.13.0.96
Trying to open library... ok
Finding cudadevrt from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudadevrt.a
Checking library... ok
Finding libdevice from CUDA_HOME
Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
Checking library... ok
{{}}
b. Verify nvidia-smi output:
-----------------------------------------------------------------------------
| NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0 |
-----------------------------------------------------------------------------
Observed Behavior:
- numba.cuda detects the driver as 580.95.05 (from /usr/local/cuda-13.0/compat/libcuda.so.580.95.05)
- nvidia-smi reports driver version 580.82.07
Expected Behavior:
- numba.cuda and nvidia-smi should report the same driver version.
Additional Information:
- CUDA libraries loaded by Numba:
nvvm: /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
nvrtc: /usr/local/cuda/lib64/libnvrtc.so.13.0.88
cudart: /usr/local/cuda/lib64/libcudart.so.13.0.96
cudadevrt: /usr/local/cuda/lib64/libcudadevrt.a
libdevice: /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
Output for ldconfig (venv) (app-root) /opt/app-root$ ldconfig -p | grep libcuda libcudart.so.13 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.13 libcudart.so.11.0 (libc6,x86-64) => /usr/local/cuda-11/targets/x86_64-linux/lib/libcudart.so.11.0 libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so libcudadebugger.so.1 (libc6,x86-64) => /usr/lib64/libcudadebugger.so.1 libcudadebugger.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcudadebugger.so.1 libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1 libcuda.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so.1 libcuda.so (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so
- is Informed by
-
AIPCC-9449 Numba CUDA fails when CUDA is pre-initialized by vLLM / PyTorch
-
- Closed
-