Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4353

Learn the theory behind DPO and study multi-node distributed training

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • InstructLab - Training
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Goal

      The goal of this task is to build a deep understanding of Direct Preference Optimization (DPO), implement it from scratch for a small-scale model, and explore strategies for orchestrating distributed training jobs—particularly multi-node and multi-GPU setups—within the llama-stack framework.

      Goal and Acceptance Criteria

      • Understand the theory behind DPO and how it differs from RLHF
      • Implement DPO from scratch on a small model and dataset (e.g., DistilGPT2 or LLaMA-1B)
      • Study distributed training frameworks (FSDP, DeepSpeed, PyTorch DDP)
      • Learn how multi-node/multi-GPU jobs could be orchestrated in llama-stack and write a short proposal outlining your approach

              rh-ee-akshirsa Atharva Kshirsagar
              nnarendr@redhat.com Nehanth Narendrula (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: