• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • PyTorch
    • None
    • PyTorch Sprint 23

          1. 🚀 The feature, motivation and pitch
        1. 🐛 `grad_dtype` property lost when using `fully_shard` (FSDP2)
          1. Problem
            The `grad_dtype` property of PyTorch tensors, which allows specifying gradient dtype for leaf tensors, is not preserved when applying `torch.distributed.fsdp.fully_shard`. This breaks precision control in distributed training.
          1. Alternatives

      No response

          1. Additional context

      No response

      cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @pragupta @msaroufim @dcci @zhaojuanmao @mrshenli @rohan-varma @chauhang @mori360 @ppwwyyxx @penguinwu

              rh-ee-amaitra Arkadip Maitra
              rh-ee-amaitra Arkadip Maitra
              PyTorch Distributed
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: