Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-9449

Numba CUDA fails when CUDA is pre-initialized by vLLM / PyTorch

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • RHAIIS-3.3
    • Accelerator Enablement
    • False
    • Hide

      None

      Show
      None
    • False
    • AIPCC Accelerators 25

      Summary

      The Numba CUDA probe reports a failure after installing vllm because CUDA is eagerly initialized by PyTorch/triton earlier in the process. When Numba subsequently attempts to initialize the CUDA driver, it receives CUDA_ERROR_OPERATING_SYSTEM (304), which is expected CUDA behavior in this scenario. The environment and GPU are healthy, and CUDA workloads function correctly via PyTorch/vLLM.

      Debug in progress if this is a bug or correct behavior 

      Environment

      • GPU: NVIDIA L4 (1 GPU)
      • Driver: 580.82.07
      • CUDA Version: 13.0
      • OS: Linux
      • Python: 3.12
      • Packages involved:
        • numba
        • torch

      Note: Failing on previous RHAIIS 3.2.5 as well

      Reproduction Steps

      1. Create a container on an x86 system with CUDA hardware using the image above:
      Container image : registry.gitlab.com/redhat/rhel-ai/core/base-images/app/aipcc-cuda13.0-el9.6-app-x86_64:ci_461 
      1. Install the Numba wheel:
      Torch Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/51406256
       
      Numba Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/52183943
      1. Run the following Python Code
         
        # repro_vllm_numba_conflict.py
        import torch  # triggers CUDA init
        from numba import cuda
        import numpy as np
        print("Torch CUDA available:", torch.cuda.is_available())
        print("Numba sees GPUs:", len(cuda.gpus))
        # This is where it dies AFTER vllm install
        cuda.select_device(0)
        arr = cuda.device_array(10, dtype=np.float32)
        print("SUCCESS (you will NOT see this)")
        
        

         

      Error Observed: 

      numba.cuda.cudadrv.error.CudaSupportError:
      Call to cuInit results in CUDA_ERROR_OPERATING_SYSTEM (304)
       

       

      Other Observation: 

      ```
      python -c "from numba import cuda; cuda.cudadrv.libs.test()"
      Finding driver from candidates:
      /usr/local/cuda/compat/libcuda.so.1
      Using loader <class 'ctypes.CDLL'>
      Trying to load driver... ok
      Loaded from /usr/local/cuda/compat/libcuda.so.1
      Mapped libcuda.so paths:
      /usr/local/cuda-13.0/compat/libcuda.so.580.95.05
      Finding nvvm from CUDA_HOME
      Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
      Trying to open library... ok
      Finding nvrtc from CUDA_HOME
      Located at /usr/local/cuda/lib64/libnvrtc.so.13.0.88
      Trying to open library... ok
      Finding cudart from CUDA_HOME
      Located at /usr/local/cuda/lib64/libcudart.so.13.0.96
      Trying to open library... ok
      Finding cudadevrt from CUDA_HOME
      Located at /usr/local/cuda/lib64/libcudadevrt.a
      Checking library... ok
      Finding libdevice from CUDA_HOME
      Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
      Checking library... ok

      ```

       

       

       

              cheimes@redhat.com Christian Heimes
              rh-ee-vshaw Vikash Shaw
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: