Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-8513

[PyTorch][Upstream CI] Optimize Test Performance with 5-Shard Configuration

    • Icon: Task Task
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • False
    • Hide

      None

      Show
      None
    • False

      Problem

      Initial 3-shard configuration resulted in long per-shard execution times (~5h 26m average).

      Solution Implemented

      Increased test sharding from 3 to 5 shards to improve per-shard execution time.

      Results Achieved

      Per-shard time: 5h 26m → 3h 13m
      Reduction: 40.6% improvement
      Total time: ~16 hours (unchanged - sequential execution)
      Tests per shard: ~6,800 → ~4,100

      Benefits

      • [x] Faster feedback per shard
      • [x] Better load balancing
      • [x] Easier debugging (smaller test batches)
      • [x] Aligns with PyTorch standards (5 shards like Ubuntu)
      • [x] Ready for parallel execution if more runners added

      References

              rh-ee-sugeorge Subin George
              rh-ee-sugeorge Subin George
              PyTorch Infrastructure
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: