Loading...

XML

Word

Printable

Type: Outcome
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Core, InstructLab - Evaluation, InstructLab - Training
Labels:
None

Hierarchy Progress Bar:

86% To Do, 14% In Progress, 0% Done
Color Status:
Not Selected
Blocked:
False
Blocked Reason:

Hide

None

Show
None

Outcome Overview

The preference tuning is used to better align models with human preferences and values. There are two main techniques: Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). This card is for RLAIF.

In RLAIF for LLMs (https://arxiv.org/pdf/2309.00267), the general process is:

Generation of multiple potential responses to a given input (same as RLHF)
An AI evaluator model or function provides feedback using a constitution outlining the desired ethical and safety principles the AI should follow.
This feedback is used to train the model to favor generating responses aligned to the ones selected by the AI evaluator.

RLAIF has shown comparable or superior performance to RLHF on tasks like summarization, helpful dialogue generation, and harmless dialogue generation.

Success Criteria

A user can influence preference-tuning with InstructLab by providing a preference constitution document for RLAIF

Expected Results

RHEL AI enables preference tuning with RLAIF

GitHub reference: https://github.com/instructlab/training/issues/335

is cloned by

RHELAI-2500 Support for Preference Tuning (RLHF)

is related to

RHELAI-2500 Support for Preference Tuning (RLHF)

Assignee:: William Caban

Reporter:: William Caban

Contributors:: Mustafa Eyceoz, Oleg Silkin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/11/26 8:46 PM

Updated:: 2025/01/22 9:31 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates