Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-1626

Inference performance on cloud native AI accelerators

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Obsolete
    • Icon: Normal Normal
    • Jan 13
    • None
    • None
    • Future Sustainability
    • Inference, RHELAI
    • False
    • False
    • Hide

      None

      Show
      None
    • 8

      Use llm load test against vllm to get comparative performance data across AWS Neuron and Google TPUs for llama-3.1-8b and granite-3 8b models
      https://docs.vllm.ai/en/latest/getting_started/neuron-installation.html
      https://docs.vllm.ai/en/latest/getting_started/tpu-installation.html 

      This is not a product ask (yet). This is forward looking work to provide product guidance on which accelerator to prioritize for RHEL AI inference only use cases. 

              Unassigned Unassigned
              akamra8979 Ashish Kamra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: