-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
Feature title: Add facade NVIDIA Python packages to CUDA base image
Feature Overview:
Upstream PyPI packages like PyTorch depend on several NVIDIA python wheels like nvidia-cuda-runtime-cu12. In downstream we use system packages for CUDA stack. Our base images and downstream Python index do not provide the packages. We should pre-populate the CUDA base image with facade packages that point to system libraries and headers.
Product(s) associated:
RHAIIS: no
RHEL AI: no
RHOAI: yes
Goals:
Provide high-level goal statement with user context and expected user outcome(s) for this Feature
- Our deliverables look more similar to upstream content.
- Users can figure out CUDA version and packages with pip / uv list (like with upstream content).
- Users get better feedback when they try to mix upstream with downstream content. pip will refuse to remove or update a package
Requirements:
- CUDA base image comes with facade packages pre-installed.
- package name and package versions are taken from RPM
- all packages have a proper METADATA and INSTALLER file in the dist-info directory but no record file
- lib and include subdirectories point to correct system locations (nvshmem is different!)
- nvidia/cuda_runtime/lib -> /usr/local/cuda/lib64
- nvidia/cuda_runtime/include -> /usr/local/cuda/include
Since the packages need to match RPMs and have a symlink to external resources (not supported by wheel format), the packages could be created by a script.
Done - Acceptance Criteria:
All nvidia Python packages typically used by Torch and vLLM are present in the base image
Use Cases - i.e. User Experience & Workflow:
Users try to replace nvidia Python wheel and get a clear error message.
Out of Scope:
TBD
Documentation Considerations :
Presence of package need to be documented, at least internally in base image repo
Details
List for Torch 2.9.0
nvidia-cublas-cu12==12.8.4.1 nvidia-cuda-cupti-cu12==12.8.90 nvidia-cuda-nvrtc-cu12==12.8.93 nvidia-cuda-runtime-cu12==12.8.90 nvidia-cudnn-cu12==9.10.2.21 nvidia-cufft-cu12==11.3.3.83 nvidia-cufile-cu12==1.13.1.3 nvidia-curand-cu12==10.3.9.90 nvidia-cusolver-cu12==11.7.3.90 nvidia-cusparse-cu12==12.5.8.93 nvidia-cusparselt-cu12==0.7.1 nvidia-nccl-cu12==2.27.5 nvidia-nvjitlink-cu12==12.8.93 nvidia-nvshmem-cu12==3.3.20 nvidia-nvtx-cu12==12.8.90
Pip refusing to update a package:
$ rm nvidia_cuda_runtime_cu12-12.8.90.dist-info/RECORD $ echo "RHAI system package" > $ rm nvidia_cuda_runtime_cu12-12.8.90.dist-info/INSTALLER $ pip install -U nvidia_cuda_runtime_cu12 ... ERROR: Cannot uninstall nvidia-cuda-runtime-cu12 12.8.90, RECORD file not found. Hint: The package was installed by RHAI system package.