-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
Not Selected
Feature Overview
RAGAS has been selected to evaluate RAG flows in RHEL AI.
This Feature aims to enhance the RAGAS evaluation framework by identifying, adapting, and defining additional evaluation metrics from other open-source frameworks. This will provide a more comprehensive assessment of the RAG + model's performance in various aspects such as faithfulness, correctness, relevancy, retrievability, semantic similarity, and more.
Goals
- Expand the RAGAS evaluation framework to include additional metrics.
- Improve the overall performance assessment of RAG when used with fine-tuned models.
- Enhance the understanding of RAG techniques' strengths and weaknesses in different aspects.
Some of the RAG evaluation frameworks with metrics to consider:
- LlamaIndex RAG evaluators (MIT License)
- TrueLens Eval and the RAG Triad (MIT License)
- RAGEval (Apache 2.0)
- Phoenix (Reference only as this framework uses the ELv2 license)
- Massive Text Embedding Benchmark (MTEB)
- RAGLAB (MIT License)
- FlashRAG (MIT License)
- DeepEval (Apache 2.0)
Requirements
- The new metrics must be clearly defined and measurable without adding dependencies incompatible with the Apache 2.0 license
- The metrics should be compatible or adaptable to the RAGAS evaluation framework.
- The metrics must provide additional insights into the RAG performance beyond what is covered by the existing metrics.
Background
The RAGAS evaluation framework is a tool for assessing the performance of RAG (Retrieval-Augmented Generation) functionalities. However, it may only cover a subset of the metrics that should be considered when measuring the various tasks of a RAG flow. By extending the RAGAS framework with additional metrics, we can gain a more holistic view of a particular RAG approach's strengths and weaknesses.
Done
- [ ] The new metrics have been clearly defined, studied, and measured to determine their value when analyzing RAG techniques.
- [ ] The new metrics can extend the RAGAS evaluation framework or be implemented in a way that becomes native to the InstructLab eval capability.
- [ ] The new metrics provide additional insights into the performance of particular RAG technique.
Questions to Answer
- How will we make the new metrics easy for an InstructLab user to understand and use?
- Is the persona for the new metric a general or specialized user (e.g. data scientist)?
Out of Scope
- The implementation RAG evaluation using a local LLM (this will be covered under a different Feature card)
- A UI for the new metrics.
Customer Considerations
- The new metrics should provide additional insights into the performance of RAG tasks relevant to the customer's use case.
- The new metrics should be easy to understand and use for evaluating improvement resulting from a particular RAG technique
- is blocked by
-
RHELAI-2397 [eval] Downstream RAGAS as RAG Evaluation framework
- New
- is depended on by
-
RHELAI-2375 [eval] Local LLM for RAG Evaluation Framework
- New
- split from
-
RHELAI-2309 [eval] RAG Evaluation Framework and Metrics
- New