Loading...

XML

Word

Printable

Type: Outcome
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Core, InstructLab - Evaluation, InstructLab - Training
Labels:
None

Color Status:
Not Selected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

Outcome Overview

The preference tuning is used to better align models with human preferences and values. There are two main techniques: Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). This Outcome is for RLHF.

In RLHF for LLMs (https://arxiv.org/pdf/2203.02155), the general process is:

Generation of multiple potential responses to a given input
Having humans evaluate and rank these responses on their quality, helpfulness, accuracy, and alignment with human values
This feedback is used to train the model to favor generating responses that humans prefer

Success Criteria

A user can apply preference-tuning with InstructLab by bringing their RLHF dataset.

Expected Results

RHEL AI enables preference tuning with RLHF

GitHub reference: https://github.com/instructlab/training/issues/335

clones

RHELAI-2403 Support for Preference Tuning (RLAIF)

relates to

RHELAI-2403 Support for Preference Tuning (RLAIF)

Assignee:: William Caban

Reporter:: William Caban

Contributors:: Mustafa Eyceoz, Oleg Silkin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/12/03 1:48 PM

Updated:: 2024/12/03 1:51 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates