Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-9510

Support vLLM on CPU for RHAIIS AVX512 AMX

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Development Platform
    • None
    • True
    • Hide

      Blocked on upstream support.

      Show
      Blocked on upstream support.
    • False
    • Hide

      2026-Feb-06 red

      Auto-dispatch feature for AVX2 / AVX512 still under development and not available in vLLM 0.14. An AVX512 build does not work on older hardware.

      Show
      2026-Feb-06 red Auto-dispatch feature for AVX2 / AVX512 still under development and not available in vLLM 0.14. An AVX512 build does not work on older hardware.

      Feature Refinement Doc

      Current distribution of vLLM supports NVIDIA GPU, Intel Gaudi, and ROCm.  It would be great to have a version of vLLM that is capable of running on CPU without a GPU for smaller models.

      The strat is limited to only x86 support.

      List of models to validate for the initial support:

      • TinyLlama-1.1B-Chat-v1.0
      • Llama-3.2-1B-Instruct
      • granite-3.2-2b-instruct
      • TinyLlama-1.1B-Chat-v1.0-pruned2.4
      • TinyLlama-1.1B-Chat-v1.0-marlin
      • TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds
      • facebook/opt-125m
      • Qwen2-0.5B-Instruct-AWQ

      Models performance evaluation resources / guides:

      Midstream INFERENG CPU image build:
      quay.io/vllm/automation-vllm:cpu-19905651936

      In addition to the first delivery in RHAIIS 3.3 to support AVX2, this second delivery should support AVX2, AVX512, and AVX512 AMX in the same build.

              mdean@redhat.com Meirav Dean
              paigevauter Paige Vauter
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: