Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- pytorch_ci

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
AIPCC-8378
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem

Initial 3-shard configuration resulted in long per-shard execution times (~5h 26m average).

Solution Implemented

Increased test sharding from 3 to 5 shards to improve per-shard execution time.

Results Achieved

Per-shard time: 5h 26m → 3h 13m
Reduction: 40.6% improvement
Total time: ~16 hours (unchanged - sequential execution)
Tests per shard: ~6,800 → ~4,100

Benefits

[x] Faster feedback per shard
[x] Better load balancing
[x] Easier debugging (smaller test batches)
[x] Aligns with PyTorch standards (5 shards like Ubuntu)
[x] Ready for parallel execution if more runners added

References

Analysis: 5_SHARD_ANALYSIS_REPORT.md
Workflow: .github/workflows/rhel-build-test.yml
Performance run: https://github.com/subinz1/pytorch/actions/runs/20745368086

Assignee:: Subin George

Reporter:: Subin George

Team:: PyTorch Infrastructure

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/12 5:39 AM

Updated:: 2026/01/22 9:35 AM

Resolved:: 2026/01/12 5:40 AM

Details

Description

Problem

Solution Implemented

Results Achieved

Benefits

References

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty