Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: RHAIIS-3.2.1
Component/s: Accelerator Enablement, AIPCC Productization
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
True
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Status Summary:

Hide

03/Sept/2025 GREEN

Work completed and will be closed upon review in feature planning call on thursday

Show
03/Sept/2025 GREEN Work completed and will be closed upon review in feature planning call on thursday

Target Version:

RHAIIS-3.2.1

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature Overview (mandatory - Complete while in New status)

Build the vLLM components needed for Red Hat AI products to run efficiently on Google Cloud TPUs.

The support for TPUs via vLLM unlocks high-performance, cost-effective inference at scale for RHAIIS customers operating on Google Cloud. Leveraging DRA’s topology-aware resource allocation, this integration will simplify the user experience and abstract away the complexity of TPU-specific configurations.

TPU supported:

v5e
Trillium
v4
v5p

Product planned to use this build

RHAIIS: Yes
RHEL AI: No (for now)
RHOAI: No (for now)

Goals (mandatory - Complete while in New status)

Who benefits from this Feature, and how?

Data scientists, ML engineers, and platform admins using RHAIIS on GCP benefit from this feature by gaining access to Google TPUs with minimal manual configuration, improved scheduling success, and optimized performance for LLM inference workloads.

What is the difference between today’s current state and a world with this Feature?

Today, Red Hat AI does not support TPU workloads natively through vLLM. Users must manage their own configurations, manually match topologies, and handle complex scheduling issues. With this feature, vLLM on RHAIIS will support TPUs with native integration, simplified deployment, and DRA-based topology alignment.

Requirements (mandatory - Complete while in Refinement status)

Requirement	Notes	isMVP?
New variant defined for wheel builder image		Yes
New collection variant defined for RHAIIS collection for vLLM		Yes

Done - Acceptance Criteria (mandatory - Complete while in Refinement status)

Users can deploy vLLM workloads on Google TPUs using RHAIIS.
Wheel builder pipeline builds a TPU-compatible runtime variant.
DRA resource claims correctly resolve valid TPU topologies with matchAttribute.
TPU support is included in RHAIIS documentation with usage examples.
vLLM integration passes CI and E2E tests on GCP TPU nodes.

Use Cases - i.e. User Experience & Workflow

A data scientist deploys a LLM inference workload to RHAIIS using Google TPUs without needing to manually configure TPU topology constraints.
A platform administrator configures TPU-backed node pools in RHAIIS and deploys vLLM workloads using DRA ResourceClaims abstracted through the UI or YAML templates.

Out of Scope

Support for non-Google Cloud TPUs .
Training workloads (only inference via vLLM is targeted).

Documentation Considerations

Reference to supported GCP regions and TPU types.

Background and Strategic Fit

As Google TPU usage increases across generative AI workloads, supporting TPU on Red Hat AI via vLLM provides a strategic advantage in cloud-agnostic AI infrastructure. With Kubernetes DRA reaching maturity (targeted GA in K8s 1.33), Red Hat can lead by integrating AI inference infrastructure with cutting-edge scheduling and topology management.

Customer Considerations

Large-scale enterprise users who are standardizing on Google Cloud AI hardware.
Customers needing automated TPU provisioning for large LLM inferencing at scale.

Team Sign Off (Completion while in Planning status)

Reviewed By: [TBD]

Team Name: AIPCC

Accepted: [ ]

Notes:

FixVersion: [TBD – Pending team capacity review]

All epics and dependent stories to be created once TPU runtime and DRA integration story is finalized.

Would you like me to generate linked Epics and initial Stories for this feature?

is depended on by

AIPCC-4113 vllm-0.10.0 Google TPU work

Closed

mentioned on

Merge request - AIPCC-1446: Build vLLM for Google TPU

Solved by commit 6000e2005d50f99f162162cfa43128759563c1c2.

Assignee:: Frank Jansen

Reporter:: Taneem Ibrahim

Additional Assignees:: Ali Raza

Votes:: 0 Vote for this issue

Watchers:: 17 Start watching this issue

Created:: 2025/05/13 12:34 PM

Updated:: 2025/09/03 1:14 PM

Resolved:: 2025/09/03 1:14 PM

Details

Description

Feature Overview (mandatory - Complete while in New status)

Product planned to use this build

Goals (mandatory - Complete while in New status)

Requirements (mandatory - Complete while in Refinement status)

Done - Acceptance Criteria (mandatory - Complete while in Refinement status)

Use Cases - i.e. User Experience & Workflow

Out of Scope

Documentation Considerations

Background and Strategic Fit

Customer Considerations

Team Sign Off (Completion while in Planning status)

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty