Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4283

[instructlab/training] Change default full-state checkpoint behavior to 'overwrite'

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • InstructLab - Training
    • False
    • Hide

      None

      Show
      None
    • False

      [2775752767] Upstream Reporter: James Kunstle
      Upstream issue status: Open
      Upstream description:

      Currently, we keep all full-state checkpoints and hf_format checkpoints. This uses a lot of storage for the sake of resumeability. Instead, we could have the same outcome behavior if we only kept the most recent full-state checkpoint. Keeping all checkpoints could be additional configuration.


      Upstream URL: https://github.com/instructlab/training/issues/387

              Unassigned Unassigned
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: