-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
Feature Overview
InstructLab's functionality will be extended to support preference tuning using Reinforcement Learning with AI Feedback (RLAIF). This feature allows users to provide a constitution outlining the desired ethical and safety principles that the AI evaluator should follow.
Goals
- Enable users to provide a constitution for the ethical and safety principles for AI evaluation.
- Implement RLAIF pipeline to adjust AI behavior based on user-defined principles.
- Expand existing training features to include preference tuning functionality.
- Primary user type: Data scientists and AI Ethicists.
Requirements
- Users can define their preferred ethical and safety principles in a clear and concise format.
- The system can interpret and apply the user-defined principles to AI evaluation.
- The system provides feedback on AI behavior based on the applied principles.
- The system allows users to refine and update their preferences as needed.
Background
InstructLab needs to incorporate ethical and safety considerations in its AI alignment and evaluation process. RLAIF is a machine learning approach that enables an AI to learn a policy by interacting with an environment and receiving feedback in the form of rewards or penalties.
Done
- [ ] The user can define their preferred ethical and safety principles.
- [ ] The system can interpret and apply the user-defined principles.
- [ ] The system provides feedback on AI behavior based on the applied principles.
- [ ] The system allows users to refine and update their preferences as needed.
Questions to Answer
- How will the system interpret and apply the user-defined principles?
- What feedback mechanisms will be used to evaluate AI behavior based on the applied principles?
- How will users be able to refine and update their preferences?
Out of Scope
- Integrating with external AI evaluation tools.
- Handling complex ethical and safety dilemmas.
Customer Considerations
- Ensure that the user-defined principles are clear and concise to avoid misunderstandings.
- Provide examples to help users define their principles.
- Regularly update the system to incorporate new ethical and safety principles as they emerge.