-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
Feature Overview
This feature allows users to preference tune a model with InstructLab by providing a dataset of examples containing questions with multiple answers, along with the preferred answer. This enables the model to learn from human feedback and improve its performance over time.
Goals
- Primary User Type: Data Scientists, AI engineers
Expected functionality
- Users can provide a preference dataset containing questions, answers, and the preferred answer.
- The system will use this data to preference-tune the model's behavior, improving its performance in generating desired responses.
Requirements
- Develop an algorithm to process the preference dataset and fine-tune the model accordingly.
- Ensure the system can handle datasets of varying sizes and complexities.
Background
Reinforcement Learning from Human Feedback (RLHF) is a technique used to align AI models to human preferences by learning from human feedback. This feature will enable users to provide this feedback in the form of preference datasets.
Done
- [ ] Develop the algorithm to process the preference dataset.
- [ ] Test the system with various datasets to ensure it works as expected.
Questions to Answer
- What data formats will be accepted for the preference dataset?
- How will the system handle cases where the preferred answer is not provided in the dataset?
- What metrics will be used to evaluate the success of the RLHF fine-tuning process?
Out of Scope
- [ ] Handling complex data preprocessing tasks.
Customer Considerations
- Users should have a basic understanding of preference datasets and RLHF to effectively use this feature.
- The system should be able to handle datasets with varying levels of quality and consistency.
- Users should be aware of the potential for bias in the preference dataset and take steps to mitigate it.