Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Training
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Goal

The goal of this task is to build a deep understanding of Direct Preference Optimization (DPO), implement it from scratch for a small-scale model, and explore strategies for orchestrating distributed training jobs—particularly multi-node and multi-GPU setups—within the llama-stack framework.

Goal and Acceptance Criteria

Understand the theory behind DPO and how it differs from RLHF
Implement DPO from scratch on a small model and dataset (e.g., DistilGPT2 or LLaMA-1B)

Study distributed training frameworks (FSDP, DeepSpeed, PyTorch DDP)

Learn how multi-node/multi-GPU jobs could be orchestrated in llama-stack and write a short proposal outlining your approach

Assignee:: Atharva Kshirsagar

Reporter:: Nehanth Narendrula (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/06/13 7:12 PM

Updated:: 2025/06/13 7:12 PM

Details

Description

Goal

Goal and Acceptance Criteria

Attachments

Easy Agile Planning Poker

Activity

People

Dates