Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Training
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
RHELAI-3091

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Goal:
Enable DPO training via TRL within Llama Stack, so that models can be fine-tuned using preference datasets through the existing post-training workflow.

Acceptance Criteria:

DPO training can be launched via post-training API

TRL config supports key DPO hyperparameters

Preference dataset format (prompt, chosen, rejected) is handled correctly

Checkpoints and training metrics are saved

Training runs on single GPU successfully

Successful test run using any model

Repo:
llama-stack-provider-trl

Assignee:: Nehanth Narendrula (Inactive)

Reporter:: Atharva Kshirsagar

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/06/09 5:28 PM

Updated:: 2025/06/13 7:03 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates