-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
False
-
PSAP Sprint 210
Abstract:
NVIDIA Ampere GPUs (A100 and A30) have the unique feature of supporting dynamic slicing of the GPU into multiple GPU instances (MIG), running in isolation (guaranteed QoS) from one another.
In this presentation, we first present the work we did in collaboration with NVIDIA to support MIG reconfiguration in the GPU Operator. This reconfiguration is triggered by a simple update of the node label.
In the second part of the session, we present an AI/ML benchmarking of the GPU, where we measure the computing performance of the different instance sizes. We also validate the isolation of the instances by running multiple workloads in parallel.