-
Feature
-
Resolution: Done
-
Major
-
None
-
RHAIIS-3.2.1
-
None
Feature Overview (mandatory - Complete while in New status)
Build the vLLM components needed for Red Hat AI products to run efficiently on Google Cloud TPUs.
The support for TPUs via vLLM unlocks high-performance, cost-effective inference at scale for RHAIIS customers operating on Google Cloud. Leveraging DRA’s topology-aware resource allocation, this integration will simplify the user experience and abstract away the complexity of TPU-specific configurations.
TPU supported:
- v5e
- Trillium
- v4
- v5p
Product planned to use this build
RHAIIS: Yes
RHEL AI: No (for now)
RHOAI: No (for now)
Goals (mandatory - Complete while in New status)
Who benefits from this Feature, and how?
Data scientists, ML engineers, and platform admins using RHAIIS on GCP benefit from this feature by gaining access to Google TPUs with minimal manual configuration, improved scheduling success, and optimized performance for LLM inference workloads.
What is the difference between today’s current state and a world with this Feature?
Today, Red Hat AI does not support TPU workloads natively through vLLM. Users must manage their own configurations, manually match topologies, and handle complex scheduling issues. With this feature, vLLM on RHAIIS will support TPUs with native integration, simplified deployment, and DRA-based topology alignment.
Requirements (mandatory - Complete while in Refinement status)
Requirement | Notes | isMVP? |
---|---|---|
New variant defined for wheel builder image | Yes | |
New collection variant defined for RHAIIS collection for vLLM | Yes |
Done - Acceptance Criteria (mandatory - Complete while in Refinement status)
- Users can deploy vLLM workloads on Google TPUs using RHAIIS.
- Wheel builder pipeline builds a TPU-compatible runtime variant.
- DRA resource claims correctly resolve valid TPU topologies with matchAttribute.
- TPU support is included in RHAIIS documentation with usage examples.
- vLLM integration passes CI and E2E tests on GCP TPU nodes.
Use Cases - i.e. User Experience & Workflow
- A data scientist deploys a LLM inference workload to RHAIIS using Google TPUs without needing to manually configure TPU topology constraints.
- A platform administrator configures TPU-backed node pools in RHAIIS and deploys vLLM workloads using DRA ResourceClaims abstracted through the UI or YAML templates.
Out of Scope
- Support for non-Google Cloud TPUs .
- Training workloads (only inference via vLLM is targeted).
Documentation Considerations
- Reference to supported GCP regions and TPU types.
Background and Strategic Fit
As Google TPU usage increases across generative AI workloads, supporting TPU on Red Hat AI via vLLM provides a strategic advantage in cloud-agnostic AI infrastructure. With Kubernetes DRA reaching maturity (targeted GA in K8s 1.33), Red Hat can lead by integrating AI inference infrastructure with cutting-edge scheduling and topology management.
Customer Considerations
- Large-scale enterprise users who are standardizing on Google Cloud AI hardware.
- Customers needing automated TPU provisioning for large LLM inferencing at scale.
Team Sign Off (Completion while in Planning status)
Reviewed By: [TBD]
Team Name: AIPCC
Accepted: [ ]
Notes:
- FixVersion: [TBD – Pending team capacity review]
- All epics and dependent stories to be created once TPU runtime and DRA integration story is finalized.
Would you like me to generate linked Epics and initial Stories for this feature?
- is depended on by
-
AIPCC-4113 vllm-0.10.0 Google TPU work
-
- Closed
-
- mentioned on