Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8511

[PyTorch][Upstream CI] Configure PyTorch Build for RHEL 9.6 with H200 Support

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Objective

      Configure PyTorch compilation for RHEL 9.6 with H200 GPU (Hopper architecture).

      Work Completed

      • Updated build workflow to support RHEL builds
      • Configured CUDA architecture: 9.0 (H200 Hopper)
      • Set up distributed build with sccache
      • Added build artifact handling
      • Configured test matrix for 5 test shards
      • Implemented self-hosted runner integration

      Performance Metrics

      • Build time: ~2 hours 16 minutes
      • Artifact size: ~3.5GB
      • Uses sccache for incremental builds

      Deliverables

      • [x] PyTorch builds successfully on RHEL 9.6
      • [x] CUDA 12.8 integration working
      • [x] H200 GPU support enabled
      • [x] Build artifacts generated correctly

      References

      • Workflow: .github/workflows/rhel-build-test.yml
      • Build script: .ci/pytorch/build.sh

              rh-ee-sugeorge Subin George
              rh-ee-sugeorge Subin George
              PyTorch Infrastructure
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: