-
Story
-
Resolution: Obsolete
-
Normal
-
None
Build on top of on-going Multi Instance GPU performance benchmarking to show how GPU utilization can be improved by slicing the GPU (A100 or the A30 card) and assign individual instances to multiple model serving applications running in parallel (triton may be used for model serving).