Loading...

XML

Word

Printable

Type: Task
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Training
Labels:
- github

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

[2813270334] Upstream Reporter: James Kunstle
Upstream issue status: Open
Upstream description:

From the FSDP docs: "FSDP currently does not support gradient accumulation outside no_sync() when using CPU offloading. This is because FSDP uses the newly-reduced gradient instead of accumulating with any existing gradient, which can lead to incorrect results."

https://pytorch.org/docs/stable/fsdp.html

Upstream URL: https://github.com/instructlab/training/issues/414

links to

Assignee:: Unassigned

Reporter:: Upstream Sync

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/06/03 6:53 PM

Updated:: 2025/06/03 6:53 PM