Uploaded image for project: 'RHEL Performance'
  1. RHEL Performance
  2. RHELPERF-118

Investigate GuideLLM workload harness for CPU-Mode inferencing performance comparison

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 13
    • False
    • Hide

      None

      Show
      None
    • False

      Utilize GuideLLM (from Neural Magic) as an AI Workload suitable for multi-arch CPU-mode testing.

      Use two inferencing engines: llama.cpp and vLLM which supports CPU-Mode as documented here

      https://docs.vllm.ai/en/stable/getting_started/installation/cpu.html#set-up-using-docker

      Document investigation findings along with setup procedures and initial test results, if possible.

              jth@redhat.com John Harrigan
              jth@redhat.com John Harrigan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: