-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
False
-
-
False
-
-
Background:
During the perf evaluation of SDG in rhelai 1.3 on 4xH100, we noticed some room for performance improvement.
Accelerator | SDG Config | vLLM Config | samples generated | sample generation timing | 70K projection (hr) | Notes | ||
num-cpus | batch-size | max-num-seqs | enable-prefix-ccaching | |||||
4xH100 | 10 | 8 | 256 | FALSE | 1568 | 506 | 6.274801587 | default SDG and vLLM config |
4xH100 | 10 | 8 | 256 | TRUE | 1532 | 475 | 6.028793153 | |
4xH100 | 16 | 256 | 1024 | FALSE | 1445 | 375 | 5.046136101 | |
4xH100 | 16 | 256 | 1024 | TRUE | 1520 | 344 | 4.400584795 |
In summary
- RHELAI-1.3 SDG on 4xH100 seems to perform better at higher SDG batch sizes and enable prefix caching as a vLLM configuration. The best timing with 4xH100 (~4.5hrs for 70K knowledge) I got by setting the following config
- SDG: num-cups 16, batch-size 256; vLLM: enable-prefix-ccaching, max-num-seqs 1024
Acceptance Criteria:
- Updating the system profiles accordingly to get the timing improvements.