-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
[2775752767] Upstream Reporter: James Kunstle
Upstream issue status: Open
Upstream description:
Currently, we keep all full-state checkpoints and hf_format checkpoints. This uses a lot of storage for the sake of resumeability. Instead, we could have the same outcome behavior if we only kept the most recent full-state checkpoint. Keeping all checkpoints could be additional configuration.
Upstream URL: https://github.com/instructlab/training/issues/387
- links to