-
Spike
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
AIPCC-1446 - Build vLLM for Google TPU - Tech Preview
-
-
-
-
AIPCC Accelerators 9, AIPCC Accelerators 10, AIPCC Accelerators 11
We're adding TPU support to the builder by creating a new tpu-ubi9 variant that builds vLLM with PyTorch XLA for Google Cloud TPUs.
This is a initial division of tasks. As the work progresses, some of these will be further subdivided. For example, sub-tasks in Task 1 and 2 wil defintely be subdiveded because they involve muliple steps. Also, new tasks and sub-tasks will be added as more work in identified.
Task 1: Container setup
Set up TPU build container with required development tools
Sub-tasks:
- Install Clang compiler (JAX recommends Clang)
- Install Bazel build system for JAX compilation
- Install CMake, Ninja build system, and Python development headers etc
Task 2: Package build plugins
Create build plugins to compile TPU dependencies from source
Sub-tasks:
- Extend PyTorch plugin to verify TPU builds
- Create PyTorch XLA build plugin
- Create JAX/jaxlib build plugin with Bazel support
- Update vLLM plugin with TPU dependency
Task 3: Build configuration
Configure build arguments and package collections for TPU
Sub-tasks:
- Create TPU build arguments file with compiler versions and source repos
- Set up TPU package collection (collections/accelerated/tpu-ubi9/)
- Add TPU-specific build environment to vLLM settings (overrides/settings/vllm.yaml)
- Register TPU variant in build system i.e., chnages to Makefile
Task 4: CI/CD pipeline
Add TPU variant to GitLab CI pipeline
Sub-tasks:
- Add TPU build and test jobs to main CI configuration
- Update shared CI templates to include TPU variant
- Test complete build pipeline with bootstrap script (./bin/bootstrap.sh)
Task 5: Testing, to be scoped