Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: PyTorch
Labels:
None

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
External issue URL:
https://github.com/pytorch/pytorch/issues/170648
Intelligence Requested:
Market:

Sprint:
PyTorch Sprint 23

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

1. 1. 🚀 The feature, motivation and pitch

1. 🐛 `grad_dtype` property lost when using `fully_shard` (FSDP2)

1. 1. Problem
    The `grad_dtype` property of PyTorch tensors, which allows specifying gradient dtype for leaf tensors, is not preserved when applying `torch.distributed.fsdp.fully_shard`. This breaks precision control in distributed training.

1. 1. Alternatives

No response

1. 1. Additional context

No response

cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @zhaojuanmao @mrshenli @rohan-varma @chauhang @mori360 @ppwwyyxx @penguinwu

Assignee:: Arkadip Maitra

Reporter:: Arkadip Maitra

Team:: PyTorch Distributed

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2026/01/09 4:40 PM

Updated:: 2026/01/09 4:41 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty