-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
Feature Overview
InstructLab's CLI should be extended to support preference tuning with the RLAIF (Reinforcement Learning with AI Feedback) technique. This feature allows users to provide a corpus or constitution outlining their desired ethical and safety principles for the AI to follow to work as AI evaluator in a RLAIF pipeline.
Goals
- Enable users to provide a constitution of their ethical and safety principles for the RLAIF process
- Expand ilab CLI by adding a new command or flag for preference tuning
- Anticipated primary user type: AI researchers, developers, and AI ethicists
Requirements
- The CLI should accept a file or input containing the ethical and safety principles in a well-known and defined schema
- The CLI should validate the input to ensure it follows the structure required by the RLAIF technique.
- The CLI should trigger a pipeline to augment the training data with a dataset encoding the provided principles.
Background
The RLAIF technique is a technique for aligning models to human preference derived from documentation/constitution provided by a user. By allowing users to provide their own principles, InstructLab can better align with their specific needs and values.
Done
- [ ] The CLI accepts a file or input containing the ethical and safety principles.
- [ ] The CLI validates the input to ensure it follows the RLAIF technique structure.
Questions to Answer
- What file format should be used to provide the constitution of ethical and safety principles? (JSON, YAML, etc.)
- Should the AI's training data be updated in real-time or during a separate training process?
Out of Scope
- The implementation of the RLAIF technique itself. (see specific card for it)
Customer Considerations
- Ensure the CLI is user-friendly and easy to understand, even for users without extensive technical knowledge.
- Provide clear documentation and examples to help users define their ethical and safety principles.
- Consider providing a pre-defined set of principles for users unsure how to define their own.
- is cloned by
-
RHELAI-2409 [ilab] Extend CLI to support preference tuning with RLHF technique
- New