-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
[2813270334] Upstream Reporter: James Kunstle
Upstream issue status: Open
Upstream description:
From the FSDP docs: "FSDP currently does not support gradient accumulation outside no_sync() when using CPU offloading. This is because FSDP uses the newly-reduced gradient instead of accumulating with any existing gradient, which can lead to incorrect results."
Upstream URL: https://github.com/instructlab/training/issues/414
- links to