-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Problem
Initial 3-shard configuration resulted in long per-shard execution times (~5h 26m average).
Solution Implemented
Increased test sharding from 3 to 5 shards to improve per-shard execution time.
Results Achieved
Per-shard time: 5h 26m → 3h 13m
Reduction: 40.6% improvement
Total time: ~16 hours (unchanged - sequential execution)
Tests per shard: ~6,800 → ~4,100
Benefits
- [x] Faster feedback per shard
- [x] Better load balancing
- [x] Easier debugging (smaller test batches)
- [x] Aligns with PyTorch standards (5 shards like Ubuntu)
- [x] Ready for parallel execution if more runners added
References
- Analysis: 5_SHARD_ANALYSIS_REPORT.md
- Workflow: .github/workflows/rhel-build-test.yml
- Performance run: https://github.com/subinz1/pytorch/actions/runs/20745368086