Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Evaluation, InstructLab - RAG
Labels:
- 2.Next-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature Overview

RAGAS has been selected to evaluate RAG flows in RHEL AI.

This Feature aims to enhance the RAGAS evaluation framework by identifying, adapting, and defining additional evaluation metrics from other open-source frameworks. This will provide a more comprehensive assessment of the RAG + model's performance in various aspects such as faithfulness, correctness, relevancy, retrievability, semantic similarity, and more.

Goals

Expand the RAGAS evaluation framework to include additional metrics.
Improve the overall performance assessment of RAG when used with fine-tuned models.
Enhance the understanding of RAG techniques' strengths and weaknesses in different aspects.

Some of the RAG evaluation frameworks with metrics to consider:

LlamaIndex RAG evaluators (MIT License)

TrueLens Eval and the RAG Triad (MIT License)
RAGEval (Apache 2.0)
Phoenix (Reference only as this framework uses the ELv2 license)
Massive Text Embedding Benchmark (MTEB)
RAGLAB (MIT License)
FlashRAG (MIT License)
DeepEval (Apache 2.0)

Requirements

The new metrics must be clearly defined and measurable without adding dependencies incompatible with the Apache 2.0 license
The metrics should be compatible or adaptable to the RAGAS evaluation framework.
The metrics must provide additional insights into the RAG performance beyond what is covered by the existing metrics.

Background

The RAGAS evaluation framework is a tool for assessing the performance of RAG (Retrieval-Augmented Generation) functionalities. However, it may only cover a subset of the metrics that should be considered when measuring the various tasks of a RAG flow. By extending the RAGAS framework with additional metrics, we can gain a more holistic view of a particular RAG approach's strengths and weaknesses.

Done

[ ] The new metrics have been clearly defined, studied, and measured to determine their value when analyzing RAG techniques.
[ ] The new metrics can extend the RAGAS evaluation framework or be implemented in a way that becomes native to the InstructLab eval capability.
[ ] The new metrics provide additional insights into the performance of particular RAG technique.

Questions to Answer

How will we make the new metrics easy for an InstructLab user to understand and use?
Is the persona for the new metric a general or specialized user (e.g. data scientist)?

Out of Scope

The implementation RAG evaluation using a local LLM (this will be covered under a different Feature card)
A UI for the new metrics.

Customer Considerations

The new metrics should provide additional insights into the performance of RAG tasks relevant to the customer's use case.
The new metrics should be easy to understand and use for evaluating improvement resulting from a particular RAG technique

is blocked by

RHELAI-2397 [eval] Downstream RAGAS as RAG Evaluation framework

Resolved

is depended on by

RHELAI-2375 [eval] Local LLM for RAG Evaluation Framework

split from

RHELAI-2309 [eval] RAG Evaluation Framework and Metrics

Assignee:: William Caban

Reporter:: William Caban

Contributors:: Ilan Pinto, Ilya Kolchinsky, Oleg Silkin

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/11/24 1:10 AM

Updated:: 2025/02/25 12:44 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates