Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-487

Parallel model serving (inference) performance with Multi Instance GPUs

XMLWordPrintable

    • Undefined

      Build on top of on-going Multi Instance GPU performance benchmarking to show how GPU utilization can be improved by slicing the GPU (A100 or the A30 card) and assign individual instances to multiple model serving applications running in parallel (triton may be used for model serving).

       

              yfama Yuchen Fama
              akamra8979 Ashish Kamra
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: