-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
-
- 🚀 The feature, motivation and pitch
-
-
- 🐛 `grad_dtype` property lost when using `fully_shard` (FSDP2)
-
-
- Problem
The `grad_dtype` property of PyTorch tensors, which allows specifying gradient dtype for leaf tensors, is not preserved when applying `torch.distributed.fsdp.fully_shard`. This breaks precision control in distributed training.
- Problem
-
-
-
- Alternatives
-
No response
-
-
- Additional context
-
No response
cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @zhaojuanmao @mrshenli @rohan-varma @chauhang @mori360 @ppwwyyxx @penguinwu