Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Accelerator Enablement
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature title: Add facade NVIDIA Python packages to CUDA base image

Feature Overview:
Upstream PyPI packages like PyTorch depend on several NVIDIA python wheels like nvidia-cuda-runtime-cu12. In downstream we use system packages for CUDA stack. Our base images and downstream Python index do not provide the packages. We should pre-populate the CUDA base image with facade packages that point to system libraries and headers.

Product(s) associated:

RHAIIS: no
RHEL AI: no
RHOAI: yes

Goals:
Provide high-level goal statement with user context and expected user outcome(s) for this Feature

Our deliverables look more similar to upstream content.
Users can figure out CUDA version and packages with pip / uv list (like with upstream content).
Users get better feedback when they try to mix upstream with downstream content. pip will refuse to remove or update a package

Requirements:

CUDA base image comes with facade packages pre-installed.
package name and package versions are taken from RPM
all packages have a proper METADATA and INSTALLER file in the dist-info directory but no record file
lib and include subdirectories point to correct system locations (nvshmem is different!)
- nvidia/cuda_runtime/lib -> /usr/local/cuda/lib64
- nvidia/cuda_runtime/include -> /usr/local/cuda/include

Since the packages need to match RPMs and have a symlink to external resources (not supported by wheel format), the packages could be created by a script.

Done - Acceptance Criteria:
All nvidia Python packages typically used by Torch and vLLM are present in the base image

Use Cases - i.e. User Experience & Workflow:
Users try to replace nvidia Python wheel and get a clear error message.

Out of Scope:
TBD

Documentation Considerations :
Presence of package need to be documented, at least internally in base image repo

Details

List for Torch 2.9.0

nvidia-cublas-cu12==12.8.4.1
nvidia-cuda-cupti-cu12==12.8.90
nvidia-cuda-nvrtc-cu12==12.8.93
nvidia-cuda-runtime-cu12==12.8.90
nvidia-cudnn-cu12==9.10.2.21
nvidia-cufft-cu12==11.3.3.83
nvidia-cufile-cu12==1.13.1.3
nvidia-curand-cu12==10.3.9.90
nvidia-cusolver-cu12==11.7.3.90
nvidia-cusparse-cu12==12.5.8.93
nvidia-cusparselt-cu12==0.7.1
nvidia-nccl-cu12==2.27.5
nvidia-nvjitlink-cu12==12.8.93
nvidia-nvshmem-cu12==3.3.20
nvidia-nvtx-cu12==12.8.90

Pip refusing to update a package:

$ rm nvidia_cuda_runtime_cu12-12.8.90.dist-info/RECORD
$ echo "RHAI system package" > $ rm nvidia_cuda_runtime_cu12-12.8.90.dist-info/INSTALLER
$ pip install -U nvidia_cuda_runtime_cu12
...
ERROR: Cannot uninstall nvidia-cuda-runtime-cu12 12.8.90, RECORD file not found. Hint: The package was installed by RHAI system package.

Assignee:: Unassigned

Reporter:: Christian Heimes

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/11/27 11:25 AM

Updated:: 2025/11/27 11:27 AM

Details

Description

Product(s) associated:

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty

Hide