Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-1446

Build vLLM for Google TPU - Tech Preview

    • False
    • Hide

      None

      Show
      None
    • True
    • 0% To Do, 0% In Progress, 100% Done
    • Hide

      03/Sept/2025 GREEN

      Work completed and will be closed upon review in feature planning call on thursday

      Show
      03/Sept/2025 GREEN Work completed and will be closed upon review in feature planning call on thursday

      Feature Overview (mandatory - Complete while in New status)

      Build the vLLM components needed for Red Hat AI products to run efficiently on Google Cloud TPUs.

      The support for TPUs via vLLM unlocks high-performance, cost-effective inference at scale for RHAIIS customers operating on Google Cloud. Leveraging DRA’s topology-aware resource allocation, this integration will simplify the user experience and abstract away the complexity of TPU-specific configurations.

      TPU supported:

      • v5e
      • Trillium
      • v4
      • v5p

      Product planned to use this build

      RHAIIS: Yes
      RHEL AI: No (for now)
      RHOAI: No (for now)

      Goals (mandatory - Complete while in New status)

      Who benefits from this Feature, and how?

      Data scientists, ML engineers, and platform admins using RHAIIS on GCP benefit from this feature by gaining access to Google TPUs with minimal manual configuration, improved scheduling success, and optimized performance for LLM inference workloads.

      What is the difference between today’s current state and a world with this Feature?

      Today, Red Hat AI does not support TPU workloads natively through vLLM. Users must manage their own configurations, manually match topologies, and handle complex scheduling issues. With this feature, vLLM on RHAIIS will support TPUs with native integration, simplified deployment, and DRA-based topology alignment.

      Requirements (mandatory - Complete while in Refinement status)

      Requirement Notes isMVP?
      New variant defined for wheel builder image   Yes
      New collection variant defined for RHAIIS collection for vLLM   Yes

      Done - Acceptance Criteria (mandatory - Complete while in Refinement status)

      • Users can deploy vLLM workloads on Google TPUs using RHAIIS.
      • Wheel builder pipeline builds a TPU-compatible runtime variant.
      • DRA resource claims correctly resolve valid TPU topologies with matchAttribute.
      • TPU support is included in RHAIIS documentation with usage examples.
      • vLLM integration passes CI and E2E tests on GCP TPU nodes.

      Use Cases - i.e. User Experience & Workflow

      • A data scientist deploys a LLM inference workload to RHAIIS using Google TPUs without needing to manually configure TPU topology constraints.
      • A platform administrator configures TPU-backed node pools in RHAIIS and deploys vLLM workloads using DRA ResourceClaims abstracted through the UI or YAML templates.

      Out of Scope

      • Support for non-Google Cloud TPUs .
      • Training workloads (only inference via vLLM is targeted).

      Documentation Considerations

      • Reference to supported GCP regions and TPU types.

      Background and Strategic Fit

      As Google TPU usage increases across generative AI workloads, supporting TPU on Red Hat AI via vLLM provides a strategic advantage in cloud-agnostic AI infrastructure. With Kubernetes DRA reaching maturity (targeted GA in K8s 1.33), Red Hat can lead by integrating AI inference infrastructure with cutting-edge scheduling and topology management.

      Customer Considerations

       

      • Large-scale enterprise users who are standardizing on Google Cloud AI hardware.
      • Customers needing automated TPU provisioning for large LLM inferencing at scale.

      Team Sign Off (Completion while in Planning status)

       

      Reviewed By: [TBD]

      Team Name: AIPCC

      Accepted: [ ]

      Notes:

      • FixVersion: [TBD – Pending team capacity review]
      • All epics and dependent stories to be created once TPU runtime and DRA integration story is finalized.

       

       


       

      Would you like me to generate linked Epics and initial Stories for this feature?

              fjansen@redhat.com Frank Jansen
              rhn-support-tibrahim Taneem Ibrahim
              Ali Raza
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: