Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2374

[eval] Extending RAGAS Evaluation Framework with Additional Metrics

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Feature Overview

      RAGAS has been selected to evaluate RAG flows in RHEL AI.

      This Feature aims to enhance the RAGAS evaluation framework by identifying, adapting, and defining additional evaluation metrics from other open-source frameworks. This will provide a more comprehensive assessment of the RAG + model's performance in various aspects such as faithfulness, correctness, relevancy, retrievability, semantic similarity, and more.

      Goals

      • Expand the RAGAS evaluation framework to include additional metrics.
      • Improve the overall performance assessment of RAG when used with fine-tuned models.
      • Enhance the understanding of RAG techniques' strengths and weaknesses in different aspects.

      Some of the RAG evaluation frameworks with metrics to consider:

      Requirements

      • The new metrics must be clearly defined and measurable without adding dependencies incompatible with the Apache 2.0 license
      • The metrics should be compatible or adaptable to the RAGAS evaluation framework.
      • The metrics must provide additional insights into the RAG performance beyond what is covered by the existing metrics.

      Background

      The RAGAS evaluation framework is a tool for assessing the performance of RAG (Retrieval-Augmented Generation) functionalities. However, it may only cover a subset of the metrics that should be considered when measuring the various tasks of a RAG flow. By extending the RAGAS framework with additional metrics, we can gain a more holistic view of a particular RAG approach's strengths and weaknesses.

      Done

      • [ ] The new metrics have been clearly defined, studied, and measured to determine their value when analyzing RAG techniques.
      • [ ] The new metrics can extend the RAGAS evaluation framework or be implemented in a way that becomes native to the InstructLab eval capability.
      • [ ] The new metrics provide additional insights into the performance of particular RAG technique.

      Questions to Answer

      • How will we make the new metrics easy for an InstructLab user to understand and use?
      • Is the persona for the new metric a general or specialized user (e.g. data scientist)?

      Out of Scope

      • The implementation RAG evaluation using a local LLM (this will be covered under a different Feature card)
      • A UI for the new metrics.

      Customer Considerations

      • The new metrics should provide additional insights into the performance of RAG tasks relevant to the customer's use case.
      • The new metrics should be easy to understand and use for evaluating improvement resulting from a particular RAG technique

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Ilan Pinto, Ilya Kolchinsky, Oleg Silkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: