-
Spike
-
Resolution: Obsolete
-
Normal
-
None
-
None
Use llm load test against vllm to get comparative performance data across AWS Neuron and Google TPUs for llama-3.1-8b and granite-3 8b models
https://docs.vllm.ai/en/latest/getting_started/neuron-installation.html
https://docs.vllm.ai/en/latest/getting_started/tpu-installation.html
This is not a product ask (yet). This is forward looking work to provide product guidance on which accelerator to prioritize for RHEL AI inference only use cases.