Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: AIPCC Productization, Development Platform
Labels:
- Approved-RFE

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Status Summary:

Hide

2026-Jan-14 green

Builder release v26.0.0 can build vLLM for x86_64 CPU (AVX2-only). RHAIIS pipeline is building vLLM 0.13.0+rhai0 with Torch 2.9.1. InferEng team is working on container image for CPU.

Show
2026-Jan-14 green Builder release v26.0.0 can build vLLM for x86_64 CPU (AVX2-only). RHAIIS pipeline is building vLLM 0.13.0+rhai0 with Torch 2.9.1. InferEng team is working on container image for CPU.

Target Version:

RHAIIS-3.3

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature title: Build vllm components and images for CPU-only systems

Feature Overview:

Several things drive the need for this work:

Batch inferencing jobs run on large systems using x86, Power and Z CPUs do not need the "realtime" response time provided by hosts with hardware accelerators.
Components of the system, such as llama-stack, benefit from having a vllm that can run inline in a pod on any system to perform simple inferencing with small models.
Partners outside of Red Hat who will provide vllm or torch plugins need the CPU build of those libraries to drive their plugins.

Product(s) associated:

RHAIIS: Yes
RHEL AI: No
RHOAI: Yes

Goals:

We need to provide CPU-only builds for all CPU architectures of PyTorch and vllm.
We need to provide CPU-only builds of the vllm image in RHAIIS for all CPU architectures.

Requirements:

CPU arch and optimizations:
- x86_64 with AVX2 optimization
Torch 2.9.1
vLLM 0.13
RHAIIS vLLM image

Done - Acceptance Criteria:

Component teams can install vllm and torch into their image using AIPCC base images without hardware accelerator support.
Partners can build on the RHAIIS CPU image to add their own plugins to provide accelerator support for accelerator types not built inside Red Hat.

Use Cases - i.e. User Experience & Workflow:
Include use case diagrams, main success scenarios, alternative flow scenarios.

Out of Scope:

CPU arches and optimizations

aarch64 with ARM compute library (via oneDNN)
ppc64le / Power
s390x / Z
x86_64 AVX512 (via oneDNN)

We plan to deliver ARM, Power, Z, and AVX512 support in 3.4EA1.
Additional AVX512 optimizations for x86_64v4 ISA depend on new features in vLLM 0.14+. vLLM 0.13 can either be compiled for AVX2 or AVX512. A AVX512 build does not work on older CPUs. Upcoming releases will be able to detect CPU capabilities and select the optimal implementation at runtime.

Documentation Considerations :
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation.

Original Request:

Building vLLM to run on CPU-only systems (no GPU) for smaller models.

List of models to validate for the initial support:

TinyLlama-1.1B-Chat-v1.0
Llama-3.2-1B-Instruct
granite-3.2-2b-instruct
TinyLlama-1.1B-Chat-v1.0-pruned2.4
TinyLlama-1.1B-Chat-v1.0-marlin
TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds
facebook/opt-125m
Qwen2-0.5B-Instruct-AWQ

GuideLLM benchmarks:
https://developers.redhat.com/articles/2025/06/17/how-run-vllm-cpus-openshift-gpu-free-inference

vLLM (CPU) Performance Evaluation Guide

Midstream INFERENG CPU image build:
quay.io/vllm/automation-vllm:cpu-19905651936

is cloned by

AIPCC-8765 Build vllm components and images for CPU-only systems (aarch64, Power, Z, x86_64v4)

In Progress

AIPCC-8766 Build vllm components and images for CPU-only x86_64 AVX512 systems

Closed

is related to

AIPCC-9510 Support vLLM on CPU for RHAIIS AVX512 AMX

To Do

relates to

AIPCC-7460 Build Python wheels on IBM Power to publish them to RH Public index

Closed

Assignee:: Meirav Dean

Reporter:: Trevor Royer

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/12/04 8:14 PM

Updated:: 2026/02/24 5:16 PM

Resolved:: 2026/01/21 2:07 PM

Details

Description

Product(s) associated:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty