-
Story
-
Resolution: Done
-
Normal
-
None
-
None
User Story:
As a RHODS admin, I would like to increase the efficiency of my GPU for multi-model inference
I want test dynamic partitioning of GPU resources using MPS and https://github.com/nebuly-ai/nos
so that more models can efficiently use the GPUs for increasing the inference throughput while maintaining an acceptable latency
Acceptance criteria:
Report with MPS results and recommendations