Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1626

Inference performance on cloud native AI accelerators

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Obsolete
    • Icon: Normal Normal
    • Jan 13
    • None
    • None
    • Future Sustainability
    • Inference, RHELAI
    • False
    • False
    • Hide

      None

      Show
      None
    • 8

      Use llm load test against vllm to get comparative performance data across AWS Neuron and Google TPUs for llama-3.1-8b and granite-3 8b models
      https://docs.vllm.ai/en/latest/getting_started/neuron-installation.html
      https://docs.vllm.ai/en/latest/getting_started/tpu-installation.html 

      This is not a product ask (yet). This is forward looking work to provide product guidance on which accelerator to prioritize for RHEL AI inference only use cases. 

              yfama Yuchen Fama
              akamra8979 Ashish Kamra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: