-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
13
-
False
-
-
False
-
-
Utilize GuideLLM (from Neural Magic) as an AI Workload suitable for multi-arch CPU-mode testing.
Use two inferencing engines: llama.cpp and vLLM which supports CPU-Mode as documented here
https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#set-up-using-docker
Document investigation findings along with setup procedures and initial test results, if possible.
- clones
-
RHELPERF-104 Investigate CPU-Mode AI Workloads for multi-arch performance testing
-
- In Progress
-