Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: AI/ML
Labels:
- RHODS_GPU

Epic Link:
RHOAI future ideas
Ready:
False
Blocked:
False
Blocked Reason:

Hide

None

Show
None

Story Points:
3
Sprint:
PSAP - General-5

SFDC Cases Counter:
SFDC Cases Links:
SFDC Cases Open:

Intelligence Requested:
Market:

User Story:
As a RHODS admin, I would like to increase the efficiency of my GPU for multi-model inference

I want test dynamic partitioning of GPU resources using MPS and https://github.com/nebuly-ai/nos

so that more models can efficiently use the GPUs for increasing the inference throughput while maintaining an acceptable latency

Acceptance criteria:

Report with MPS results and recommendations

Assignee:: Carlos Camacho

Reporter:: Kevin Pouget

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2023/05/26 8:25 PM

Updated:: 2024/05/23 2:26 PM

Resolved:: 2024/05/23 2:26 PM