Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-9520

Mismatch Between numba.cuda Driver Detection and nvidia-smi Output

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • RHAIIS-3.3
    • Accelerator Enablement
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Critical

      Steps to Reproduce
      *Refer https://issues.redhat.com/browse/AIPCC-9449*

             1. Create a container on an x86 system with CUDA hardware using the image above:
      This is the same image tag as previous but this is the latest.
      Please delete the old image and pull again

      Container image : registry.gitlab.com/redhat/rhel-ai/core/base-images/app/aipcc-cuda13.0-el9.6-app-x86_64:ci_461 

             2. Install the wheel:

      Torch Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/51406256
       
      Numba Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/52183943

            3. Run the following Python command:
      python -c "from numba import cuda; cuda.cudadrv.libs.test()"
       
       
                      a. Observe the output for driver detection:
      Finding driver from candidates:
      libcuda.so
      libcuda.so.1
      /usr/lib/libcuda.so
      /usr/lib/libcuda.so.1
      /usr/lib64/libcuda.so
      /usr/lib64/libcuda.so.1
      Using loader <class 'ctypes.CDLL'>
      Trying to load driver... ok
      Loaded from libcuda.so
      Mapped libcuda.so paths:
      /usr/local/cuda-13.0/compat/libcuda.so.580.95.05
      Finding nvvm from CUDA_HOME
      Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
      Trying to open library... ok
      Finding nvrtc from CUDA_HOME
      Located at /usr/local/cuda/lib64/libnvrtc.so.13.0.88
      Trying to open library... ok
      Finding cudart from CUDA_HOME
      Located at /usr/local/cuda/lib64/libcudart.so.13.0.96
      Trying to open library... ok
      Finding cudadevrt from CUDA_HOME
      Located at /usr/local/cuda/lib64/libcudadevrt.a
      Checking library... ok
      Finding libdevice from CUDA_HOME
      Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
      Checking library... ok
       
      {{}}
        b. Verify nvidia-smi output:
      -----------------------------------------------------------------------------

      NVIDIA-SMI 580.82.07 Driver Version: 580.82.07 CUDA Version: 13.0

      -----------------------------------------------------------------------------
       
       
      Observed Behavior:

      • numba.cuda detects the driver as 580.95.05 (from /usr/local/cuda-13.0/compat/libcuda.so.580.95.05)
      • nvidia-smi reports driver version 580.82.07

      Expected Behavior:

      • numba.cuda and nvidia-smi should report the same driver version.

      Additional Information:

      • CUDA libraries loaded by Numba:
        nvvm: /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
        nvrtc: /usr/local/cuda/lib64/libnvrtc.so.13.0.88
        cudart: /usr/local/cuda/lib64/libcudart.so.13.0.96
        cudadevrt: /usr/local/cuda/lib64/libcudadevrt.a
        libdevice: /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
         Output for ldconfig
      • (venv) (app-root) /opt/app-root$ ldconfig -p | grep libcuda
        	libcudart.so.13 (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so.13
        	libcudart.so.11.0 (libc6,x86-64) => /usr/local/cuda-11/targets/x86_64-linux/lib/libcudart.so.11.0
        	libcudart.so (libc6,x86-64) => /usr/local/cuda/targets/x86_64-linux/lib/libcudart.so
        	libcudadebugger.so.1 (libc6,x86-64) => /usr/lib64/libcudadebugger.so.1
        	libcudadebugger.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcudadebugger.so.1
        	libcuda.so.1 (libc6,x86-64) => /usr/lib64/libcuda.so.1
        	libcuda.so.1 (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so.1
        	libcuda.so (libc6,x86-64) => /usr/local/cuda/compat/libcuda.so

              Unassigned Unassigned
              rh-ee-vshaw Vikash Shaw
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: