Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1066

Use Nvidia MPS for multi-model Inference

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • AI/ML
    • 3
    • PSAP - General-5

      User Story:
      As a RHODS admin, I would like to increase the efficiency of my GPU for multi-model inference

      I want test dynamic partitioning of GPU resources using MPS and https://github.com/nebuly-ai/nos

      so that more models can efficiently use the GPUs for increasing the inference throughput while maintaining an acceptable latency

      Acceptance criteria:

      Report with MPS results and recommendations

              ccamacho@redhat.com Carlos Camacho
              kpouget2 Kevin Pouget
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: