Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-665

NVIDIA GPUs on Oracle Cloud Infrastructure

XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • 50% To Do, 0% In Progress, 50% Done
    • 0

      Feature Overview

      Enable NVIDIA GPU on Oracle Cloud Infrastructure (OCI) for the OpenShift platform, allowing users to leverage GPU acceleration for their AI workloads seamlessly.

      Goals

      • Users can provision and configure NVIDIA GPU instances on Oracle Cloud Infrastructure within the OpenShift platform.
      • Users can seamlessly integrate GPU-accelerated libraries and frameworks into their AI workloads running on OpenShift.
      • Users can efficiently scale their GPU resources based on workload demands.
      • Users can monitor and manage GPU instances and performance metrics within the OpenShift console.
      • Users can share GPU memory between OCI worker nodes

      Requirements 

      • Containers should run CUDA and use one or multiple GPUs
      • MIG should work for A100 shapes
      • RDMA enabled by the NVIDIA Network Operator working to share GPU memory
      • RDMA enabled by the dma-buf working to share GPU memory
      • Enablement validated on two Compute shapes: BM.GPU4 and BM.GPU.A100

      Use Cases

      • Use Case 1: User provisions a GPU-enabled OCI instance through the OpenShift console and deploys an AI workload that leverages GPU acceleration.
      • Use Case 2: User scales up the GPU resources for an AI workload running on OpenShift to handle increased demand and achieve faster processing.
      • Use Case 3: User share GPU memory between two containers on two separated OpenShift worker nodes
      • Use Case 4: User monitors the GPU instance utilization through the OpenShift console

      Documentation Considerations

      • The NVIDIA GPU operator community documentation should be updated in the "Bare Metal / Virtual Machines with GPU Passthrough" section to confirm OCI support and list which OCI shapes are supported by NVIDIA.
      • The NVIDIA AI Enterprise support matrix should be updated  to confirm OCI support and list which OCI shapes are supported by NVIDIA.

            egallen Erwan Gallen
            egallen Erwan Gallen
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated: