-
Outcome
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Not Selected
-
False
-
Outcome Overview
The preference tuning is used to better align models with human preferences and values. There are two main techniques: Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). This Outcome is for RLHF.
In RLHF for LLMs (https://arxiv.org/pdf/2203.02155), the general process is:
- Generation of multiple potential responses to a given input
- Having humans evaluate and rank these responses on their quality, helpfulness, accuracy, and alignment with human values
- This feedback is used to train the model to favor generating responses that humans prefer
Success Criteria
- A user can apply preference-tuning with InstructLab by bringing their RLHF dataset.
Expected Results
- RHEL AI enables preference tuning with RLHF
GitHub reference: https://github.com/instructlab/training/issues/335
- clones
-
RHELAI-2403 Support for Preference Tuning (RLAIF)
- New
- relates to
-
RHELAI-2403 Support for Preference Tuning (RLAIF)
- New