Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2411

RLAIF Preference Tuning in InstructLab

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • RHELAI-2403Support for Preference Tuning (RLAIF)

      Feature Overview

      InstructLab's functionality will be extended to support preference tuning using Reinforcement Learning with AI Feedback (RLAIF). This feature allows users to provide a constitution outlining the desired ethical and safety principles that the AI evaluator should follow.

      Goals

      • Enable users to provide a constitution for the ethical and safety principles for AI evaluation.
      • Implement RLAIF pipeline to adjust AI behavior based on user-defined principles.
      • Expand existing training features to include preference tuning functionality.
      • Primary user type: Data scientists and AI Ethicists.

      Requirements

      1. Users can define their preferred ethical and safety principles in a clear and concise format.
      2. The system can interpret and apply the user-defined principles to AI evaluation.
      3. The system provides feedback on AI behavior based on the applied principles.
      4. The system allows users to refine and update their preferences as needed.

      Background

      InstructLab needs to incorporate ethical and safety considerations in its AI alignment and evaluation process. RLAIF is a machine learning approach that enables an AI to learn a policy by interacting with an environment and receiving feedback in the form of rewards or penalties.

      Done

      • [ ] The user can define their preferred ethical and safety principles.
      • [ ] The system can interpret and apply the user-defined principles.
      • [ ] The system provides feedback on AI behavior based on the applied principles.
      • [ ] The system allows users to refine and update their preferences as needed.

      Questions to Answer

      • How will the system interpret and apply the user-defined principles?
      • What feedback mechanisms will be used to evaluate AI behavior based on the applied principles?
      • How will users be able to refine and update their preferences?

      Out of Scope

      • Integrating with external AI evaluation tools.
      • Handling complex ethical and safety dilemmas.

      Customer Considerations

      • Ensure that the user-defined principles are clear and concise to avoid misunderstandings.
      • Provide examples to help users define their principles.
      • Regularly update the system to incorporate new ethical and safety principles as they emerge.

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Mustafa Eyceoz, Oleg Silkin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: