Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2403

Support for Preference Tuning (RLAIF)

XMLWordPrintable

    • 100% To Do, 0% In Progress, 0% Done
    • Not Selected
    • False
    • Hide

      None

      Show
      None

      Outcome Overview

      The preference tuning is used to better align models with human preferences and values. There are two main techniques: Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF). This card is for RLAIF.

       

      In RLAIF for LLMs (https://arxiv.org/pdf/2309.00267), the general process is:

      1. Generation of multiple potential responses to a given input (same as RLHF)
      2. An AI evaluator model or function provides feedback using a constitution outlining the desired ethical and safety principles the AI should follow.
      3. This feedback is used to train the model to favor generating responses aligned to the ones selected by the AI evaluator.

      RLAIF has shown comparable or superior performance to RLHF on tasks like summarization, helpful dialogue generation, and harmless dialogue generation.

       

      Success Criteria

      1. A user can influence preference-tuning with InstructLab by providing a preference constitution document for RLAIF

      Expected Results

      1. RHEL AI enables preference tuning with RLAIF

      GitHub reference: https://github.com/instructlab/training/issues/335

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Mustafa Eyceoz, Oleg Silkin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: