Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-759

Comparing GPU vs vGPU Performance in RHEL/Openshift

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • July Release for PSAP
    • None
    • None
    • Comparing GPU vs vGPU Performance in RHEL/Openshift
    • False
    • None
    • False
    • Hide
       - Document for performance report covering all scenarios
       - Potential presentation on results found
       - Automation code (if needed) cleaned, documented, and checked into repo
      Show
       - Document for performance report covering all scenarios  - Potential presentation on results found  - Automation code (if needed) cleaned, documented, and checked into repo
    • Not Selected
    • Done
    • Impediment
    • 0% To Do, 0% In Progress, 100% Done
    • PSAP Sprint 223, PSAP Sprint 221, PSAP Sprint 222, PSAP Sprint 223, PSAP Sprint 224, PSAP Sprint 225

      Epic Goal

      • To use ML benchmarks to assess the performance of vGPUs in VMs on RHEL, as well as in openshift virtualization, in comparison to direct GPU use.

      Why is this important?

      • Currently unknown/undocumented information, potentially desirable to customer(s)

      Scenarios

      1. Running mlperf SSD + SSDv2 training benchmarks and nvidiadl BERT benchmark on baremetal RHEL8, single GPU
      2. Starting a single VM on RHEL8 w/ a single vGPU (full capacity), running same benchmarks
      3. Running multiple workloads on baremetal w/ single GPU, running same benchmarks
      4. Starting multiple VMs (same as step 3 amount of workloads) on RHEL8, each with a vGPU, running same benchmarks
      5. Adding SNO to baremetal, running benchmarks in openshift w/ single GPU
      6. Adding SNO to the single/multi VM environments, running same benchmarks in openshift
      7. Using openshift virtualization to test VM/vGPU performance within openshift using SNO on baremetal

      Acceptance Criteria

      • Document for performance report covering all scenarios
      • Potential presentation on results found
      • Automation code (if needed) cleaned, documented, and checked into repo

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

       

            meyceoz Mustafa Eyceoz (Inactive)
            meyceoz Mustafa Eyceoz (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: