Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
- pytorch_ci

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
AIPCC-8378
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Problem

24 flex attention tests with float16 strided inputs fail on H200 (Hopper architecture) due to CUDA memory misalignment errors.

Root Cause

The flex attention implementation uses strided memory access patterns that trigger misalignment on H200 Hopper architecture (sm_90) when using float16 data type.

Impact

Tests failing: ~24
Severity: Medium - Affects float16 precision training
Production impact: Low - bfloat16 and float32 work fine
Pass rate impact: Accounts for 11% of all failures (24/215)

Technical Details

Different memory alignment requirements on Hopper architecture
Strided tensor access patterns incompatible with sm_90 float16
Issue specific to float16 (bfloat16 and float32 variants pass)

Current Workaround

Tests excluded in workflow configuration

Acceptance Criteria

[ ] Report issue to PyTorch upstream team
[ ] Fix memory alignment for strided float16 access on sm_90
[ ] All 24 flex attention float16 tests pass on H200

References

Workflow run: https://github.com/subinz1/pytorch/actions/runs/20745368086
Documentation: 5_SHARD_FAILURE_SUMMARY.md

Assignee:: Subin George

Reporter:: Subin George

Team:: PyTorch Infrastructure

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/01/12 5:40 AM

Updated:: 2026/01/22 9:35 AM

Details

Description

Problem

Root Cause

Impact

Technical Details

Current Workaround

Acceptance Criteria

References

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty