-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
50% To Do, 0% In Progress, 50% Done
-
0
Feature Overview
Enable NVIDIA GPU on Oracle Cloud Infrastructure (OCI) for the OpenShift platform, allowing users to leverage GPU acceleration for their AI workloads seamlessly.
Goals
- Users can provision and configure NVIDIA GPU instances on Oracle Cloud Infrastructure within the OpenShift platform.
- Users can seamlessly integrate GPU-accelerated libraries and frameworks into their AI workloads running on OpenShift.
- Users can efficiently scale their GPU resources based on workload demands.
- Users can monitor and manage GPU instances and performance metrics within the OpenShift console.
- Users can share GPU memory between OCI worker nodes
Requirements
- Containers should run CUDA and use one or multiple GPUs
- MIG should work for A100 shapes
- RDMA enabled by the NVIDIA Network Operator working to share GPU memory
- RDMA enabled by the dma-buf working to share GPU memory
- Enablement validated on two Compute shapes: BM.GPU4 and BM.GPU.A100
Use Cases
- Use Case 1: User provisions a GPU-enabled OCI instance through the OpenShift console and deploys an AI workload that leverages GPU acceleration.
- Use Case 2: User scales up the GPU resources for an AI workload running on OpenShift to handle increased demand and achieve faster processing.
- Use Case 3: User share GPU memory between two containers on two separated OpenShift worker nodes
- Use Case 4: User monitors the GPU instance utilization through the OpenShift console
Documentation Considerations
- The NVIDIA GPU operator community documentation should be updated in the "Bare Metal / Virtual Machines with GPU Passthrough" section to confirm OCI support and list which OCI shapes are supported by NVIDIA.
- The NVIDIA AI Enterprise support matrix should be updated to confirm OCI support and list which OCI shapes are supported by NVIDIA.
- depends on
-
OCPSTRAT-1203 [GA] OpenShift on Oracle Cloud Infrastructure (OCI) Bare metal
- In Progress
-
OCPSTRAT-174 [Dev Preview] OpenShift on Oracle Cloud Infrastructure (OCI) Bare metal
- Closed
-
OCPSTRAT-510 [Dev Preview] OpenShift on Oracle Cloud Infrastructure (OCI) with VMs
- Closed
-
OCPSTRAT-949 [Tech Preview] OpenShift on Oracle Cloud Infrastructure (OCI) with VMs
- Closed
- is incorporated by
-
OCPSTRAT-1077 [GA] OpenShift on Oracle Cloud Infrastructure (OCI) with VMs
- Closed
- relates to
-
OCPSTRAT-510 [Dev Preview] OpenShift on Oracle Cloud Infrastructure (OCI) with VMs
- Closed