Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-2091

SPIKE: Add tpu-ubi9 variant for building vLLM with TPU support using PyTorch XLA

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Accelerator Enablement
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • AIPCC-1446 - Build vLLM for Google TPU - Tech Preview
    • AIPCC-1446Build vLLM for Google TPU - Tech Preview
    • AIPCC Accelerators 9, AIPCC Accelerators 10, AIPCC Accelerators 11

      We're adding TPU support to the builder by creating a new tpu-ubi9 variant that builds vLLM with PyTorch XLA for Google Cloud TPUs.

      This is a initial division of tasks. As the work progresses, some of these will be further subdivided. For example, sub-tasks in Task 1 and 2 wil defintely be subdiveded because they involve muliple steps. Also, new tasks and sub-tasks will be added as more work in identified.

      Task 1: Container setup

      Set up TPU build container with required development tools

      Sub-tasks:

      1. Install Clang compiler (JAX recommends Clang)
      2. Install Bazel build system for JAX compilation
      3. Install CMake, Ninja build system, and Python development headers etc

      Task 2: Package build plugins

      Create build plugins to compile TPU dependencies from source

      Sub-tasks:

      1. Extend PyTorch plugin to verify TPU builds
      2. Create PyTorch XLA build plugin
      3. Create JAX/jaxlib build plugin with Bazel support
      4. Update vLLM plugin with TPU dependency

      Task 3: Build configuration

      Configure build arguments and package collections for TPU

      Sub-tasks:

      1. Create TPU build arguments file with compiler versions and source repos
      2. Set up TPU package collection (collections/accelerated/tpu-ubi9/)
      3. Add TPU-specific build environment to vLLM settings (overrides/settings/vllm.yaml)
      4. Register TPU variant in build system i.e., chnages to  Makefile

      Task 4: CI/CD pipeline

      Add TPU variant to GitLab CI pipeline 

      Sub-tasks:

      1. Add TPU build and test jobs to main CI configuration
      2. Update shared CI templates to include TPU variant
      3. Test complete build pipeline with bootstrap script (./bin/bootstrap.sh)

      Task 5: Testing, to be scoped

       

       

              rh-ee-araza Ali Raza
              rh-ee-araza Ali Raza
              Frank's Team
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: