Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: RHAIIS-3.3
Component/s: Accelerator Enablement
Labels:
- blocker

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
RHAI 3.3 bugs
Intelligence Requested:
Market:

Sprint:
AIPCC Accelerators 25

Release Blocker:
Proposed
Target Version:

rhoai-3.3, RHAIIS-3.3

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Summary

The Numba CUDA probe reports a failure after installing vllm because CUDA is eagerly initialized by PyTorch/triton earlier in the process. When Numba subsequently attempts to initialize the CUDA driver, it receives CUDA_ERROR_OPERATING_SYSTEM (304), which is expected CUDA behavior in this scenario. The environment and GPU are healthy, and CUDA workloads function correctly via PyTorch/vLLM.

Debug in progress if this is a bug or correct behavior

Environment

GPU: NVIDIA L4 (1 GPU)

Driver: 580.82.07

CUDA Version: 13.0

OS: Linux

Python: 3.12

Packages involved:

- numba

- torch

Note: Failing on previous RHAIIS 3.2.5 as well

Reproduction Steps

Create a container on an x86 system with CUDA hardware using the image above:

Container image : registry.gitlab.com/redhat/rhel-ai/core/base-images/app/aipcc-cuda13.0-el9.6-app-x86_64:ci_461

Install the Numba wheel:

Torch Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/51406256
 
Numba Wheel: https://gitlab.com/redhat/rhel-ai/rhaiis/indexes/rhaiis-3.3/cuda13.0-ubi9-x86_64/-/packages/52183943

Run the following Python Code

# repro_vllm_numba_conflict.py
import torch  # triggers CUDA init
from numba import cuda
import numpy as np
print("Torch CUDA available:", torch.cuda.is_available())
print("Numba sees GPUs:", len(cuda.gpus))
# This is where it dies AFTER vllm install
cuda.select_device(0)
arr = cuda.device_array(10, dtype=np.float32)
print("SUCCESS (you will NOT see this)")

Error Observed:

numba.cuda.cudadrv.error.CudaSupportError:
Call to cuInit results in CUDA_ERROR_OPERATING_SYSTEM (304)

Other Observation:

```
python -c "from numba import cuda; cuda.cudadrv.libs.test()"
Finding driver from candidates:
/usr/local/cuda/compat/libcuda.so.1
Using loader <class 'ctypes.CDLL'>
Trying to load driver... ok
Loaded from /usr/local/cuda/compat/libcuda.so.1
Mapped libcuda.so paths:
/usr/local/cuda-13.0/compat/libcuda.so.580.95.05
Finding nvvm from CUDA_HOME
Located at /usr/local/cuda/nvvm/lib64/libnvvm.so.4.0.0
Trying to open library... ok
Finding nvrtc from CUDA_HOME
Located at /usr/local/cuda/lib64/libnvrtc.so.13.0.88
Trying to open library... ok
Finding cudart from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudart.so.13.0.96
Trying to open library... ok
Finding cudadevrt from CUDA_HOME
Located at /usr/local/cuda/lib64/libcudadevrt.a
Checking library... ok
Finding libdevice from CUDA_HOME
Located at /usr/local/cuda/nvvm/libdevice/libdevice.10.bc
Checking library... ok

```

informs

AIPCC-9520 Mismatch Between numba.cuda Driver Detection and nvidia-smi Output

Refinement

mentioned on

Merge request - AIPCC-9449: Remove NUMBA_CUDA_DRIVER env var

Assignee:: Christian Heimes

Reporter:: Vikash Shaw

Team:: Frank's Team

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/01/28 11:19 AM

Updated:: 2026/02/03 7:53 AM

Resolved:: 2026/02/03 7:53 AM

Details

Description

Summary

Reproduction Steps

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty