-
Bug
-
Resolution: Done
-
Critical
-
None
-
RHAIIS-3.3
Summary
The Numba CUDA probe reports a failure after installing vllm because CUDA is eagerly initialized by PyTorch/triton earlier in the process. When Numba subsequently attempts to initialize the CUDA driver, it receives CUDA_ERROR_OPERATING_SYSTEM (304), which is expected CUDA behavior in this scenario. The environment and GPU are healthy, and CUDA workloads function correctly via PyTorch/vLLM.
Debug in progress if this is a bug or correct behavior
Environment
- GPU: NVIDIA L4 (1 GPU)
- Driver: 580.82.07
- CUDA Version: 13.0
- OS: Linux
- Python: 3.12
- Packages involved:
-
- numba
-
- torch
Note: Failing on previous RHAIIS 3.2.5 as well
Reproduction Steps
- Create a container on an x86 system with CUDA hardware using the image above:
Container image : registry.gitlab.com/redhat/rhel-ai/core/base-images/app/aipcc-cuda13.0-el9.6-app-x86_64:ci_461
- Install the Numba wheel:
Torch Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/51406256 Numba Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/52183943
- Run the following Python Code
# repro_vllm_numba_conflict.py import torch # triggers CUDA init from numba import cuda import numpy as np print("Torch CUDA available:", torch.cuda.is_available()) print("Numba sees GPUs:", len(cuda.gpus)) # This is where it dies AFTER vllm install cuda.select_device(0) arr = cuda.device_array(10, dtype=np.float32) print("SUCCESS (you will NOT see this)")
Error Observed:
numba.cuda.cudadrv.error.CudaSupportError: Call to cuInit results in CUDA_ERROR_OPERATING_SYSTEM (304)
Other Observation:
```
python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates:
/usr/local/cuda/compat/libcuda.so.1
Using loader <class 'ctypes.CDLL'>
Trying to load driver... ok
Loaded from /usr/local/cuda/compat/libcuda.so.1
Mapped libcuda.so paths:
/usr/local/cuda-13.0/compat/libcuda.so.580.95.05
Finding nvvm from CUDA_HOME
Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
Trying to open library... ok
Finding nvrtc from CUDA_HOME
Located at /usr/local/cuda/lib64/libnvrtc.so.13.0.88
Trying to open library... ok
Finding cudart from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudart.so.13.0.96
Trying to open library... ok
Finding cudadevrt from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudadevrt.a
Checking library... ok
Finding libdevice from CUDA_HOME
Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
Checking library... ok
```
- informs
-
AIPCC-9520 Mismatch Between numba.cuda Driver Detection and nvidia-smi Output
-
- Refinement
-
- mentioned on