Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- pytorch_ci

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
AIPCC-8378
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Objective

Analyze test results, identify failures, and document all issues for RHEL 9.6 PyTorch build.

Work Completed

Analyzed 5-shard test results (~20,400 tests)
Identified and categorized all 215 test failures
Created comprehensive failure analysis reports
Compared RHEL vs Ubuntu CI performance and pass rates
Documented root causes for each failure category

Test Coverage Results

Total tests: ~20,400
Passed: ~20,185
Failed: ~215
Pass rate: 99.0%

Failure Breakdown

CUTLASS Backend: ~189 tests (missing library)
Flex Attention: ~24 tests (H200 float16 alignment)
cuDNN JIT: 1 test (compilation issue)
RNN Flat Weights: 1 test (parameter handling)

Documentation Created

5_SHARD_FAILURE_SUMMARY.md - Detailed failure analysis
5_SHARD_ANALYSIS_REPORT.md - Performance analysis
TEST_FAILURE_REPORT.md - Initial failure investigation
RHEL_VS_UBUNTU_COMPARISON.md - Platform comparison

Deliverables

[x] Complete test failure analysis
[x] Root cause identification for all failures
[x] Comprehensive documentation
[x] Comparison with Ubuntu baseline

Assignee:: Subin George

Reporter:: Subin George

Team:: PyTorch Infrastructure

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/12 5:40 AM

Updated:: 2026/01/22 9:35 AM

Resolved:: 2026/01/12 5:40 AM