Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4311

Llama-stack external inline provider for TRL's DPO

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • InstructLab - Training
    • None

       

      Goal:
      Enable DPO training via TRL within Llama Stack, so that models can be fine-tuned using preference datasets through the existing post-training workflow.

      Acceptance Criteria:

      • DPO training can be launched via post-training API
      • TRL config supports key DPO hyperparameters
      • Preference dataset format (prompt, chosen, rejected) is handled correctly
      • Checkpoints and training metrics are saved
      • Training runs on single GPU successfully
      • Successful test run using any model

      Repo:
      llama-stack-provider-trl

              nnarendr@redhat.com Nehanth Narendrula (Inactive)
              rh-ee-akshirsa Atharva Kshirsagar
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: