Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Training
Labels:
- 1.5-candidate
- 1.6-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Parent Link:
RHELAI-2403Support for Preference Tuning (RLAIF)

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Feature Overview

InstructLab's functionality will be extended to support preference tuning using Reinforcement Learning with AI Feedback (RLAIF). This feature allows users to provide a constitution outlining the desired ethical and safety principles that the AI evaluator should follow.

Goals

Enable users to provide a constitution for the ethical and safety principles for AI evaluation.
Implement RLAIF pipeline to adjust AI behavior based on user-defined principles.
Expand existing training features to include preference tuning functionality.
Primary user type: Data scientists and AI Ethicists.

Requirements

Users can define their preferred ethical and safety principles in a clear and concise format.
The system can interpret and apply the user-defined principles to AI evaluation.
The system provides feedback on AI behavior based on the applied principles.
The system allows users to refine and update their preferences as needed.

Background

InstructLab needs to incorporate ethical and safety considerations in its AI alignment and evaluation process. RLAIF is a machine learning approach that enables an AI to learn a policy by interacting with an environment and receiving feedback in the form of rewards or penalties.

Done

[ ] The user can define their preferred ethical and safety principles.
[ ] The system can interpret and apply the user-defined principles.
[ ] The system provides feedback on AI behavior based on the applied principles.
[ ] The system allows users to refine and update their preferences as needed.

Questions to Answer

How will the system interpret and apply the user-defined principles?
What feedback mechanisms will be used to evaluate AI behavior based on the applied principles?
How will users be able to refine and update their preferences?

Out of Scope

Integrating with external AI evaluation tools.
Handling complex ethical and safety dilemmas.

Customer Considerations

Ensure that the user-defined principles are clear and concise to avoid misunderstandings.
Provide examples to help users define their principles.
Regularly update the system to incorporate new ethical and safety principles as they emerge.

Assignee:: William Caban

Reporter:: William Caban

Contributors:: Mustafa Eyceoz, Oleg Silkin

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/11/26 10:22 PM

Updated:: 2024/12/03 3:46 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates