Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2428

Potential RHELAI-1.3 SDG performance improvement on H100

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False

      Background: 

      During the perf evaluation of SDG in rhelai 1.3 on 4xH100, we noticed some room for performance improvement.

      Accelerator SDG Config vLLM Config samples generated sample generation timing 70K projection (hr) Notes
        num-cpus batch-size max-num-seqs enable-prefix-ccaching        
      4xH100 10 8 256 FALSE 1568 506 6.274801587 default SDG and vLLM config
      4xH100 10 8 256 TRUE 1532 475 6.028793153  
      4xH100 16 256 1024 FALSE 1445 375 5.046136101  
      4xH100 16 256 1024 TRUE 1520 344 4.400584795  

      In summary 

      • RHELAI-1.3 SDG on 4xH100 seems to perform better at higher SDG batch sizes and enable prefix caching as a vLLM configuration. The best timing with 4xH100 (~4.5hrs for 70K knowledge) I got by setting the following config
        • SDG: num-cups 16, batch-size 256; vLLM: enable-prefix-ccaching, max-num-seqs 1024

      Acceptance Criteria:

      • Updating the system profiles accordingly to get the timing improvements.

              akamra8979 Ashish Kamra
              npalaska@redhat.com Nikhil Palaskar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: