Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Epic Name:
DRA: e2e test suite that validates Nvidia GPU
Epic Status:
Done
Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-2384Attribute-Based GPU Allocation in OpenShift - GA 4.21
Hierarchy Progress Bar:

36% To Do, 0% In Progress, 64% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Size:
None

Target Version:
None
Release Blocker:
None

This is the follow up work required to get the DRA e2e suite running in OpenShift CI as a periodic job

Goal:

a periodic job in CI that provisions a cluster with gpu worker node and runs the e2e suite

Non Goal:

Our focus is limited to validating workload with Nvidia GPU (using the DRA driver), we will not add support for any other vendor.

The e2e suite is being worked on here: https://github.com/openshift/origin/pull/29842 . It covers the following use cases now:

define a common test spec (one pod, one container asking for a distinct GPU) that can be validated against both the example DRA driver and the Nvidia DRA driver. The goal is to have a spec that is expected to pass on both
two containers, each asking for a distinct GPU; one container should not have access to the other's GPU
MPS strategy
TimeSlicing strategy
static pre-partitioned MIG slices
IPC using CUDA

Constraints:

The Nvidia DRA driver is not part of the GPU operator yet, for now we install the Nvidia DRA driver using helm from Nvidia's official repo https://catalog.ngc.nvidia.com/orgs/nvidia/helm-charts/nvidia-dra-driver-gpu
https://github.com/NVIDIA/gpu-operator/pull/1541 is where the integration of the DRA driver is done, once the GPU operator adds the DRA driver as an operand, we can install the driver using the clusterpolicy API of the operator

Assignee:: Sai Ramesh Vanka

Reporter:: Abu Kashem (Inactive)

Need Info From:: None

Contributors:: None

QA Contact:: Aditi Sahay

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/17 1:38 PM

Updated:: 2026/02/10 3:23 PM

Resolved:: 2026/01/07 7:28 AM