Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- 1.6-candidate
- 1.Next-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature Overview (mandatory - Complete while in New status)

As part of the RAG artifacts in InstructLab, we need a RAG evaluation framework to measure the performance and quality of the RAG pipelines built for the capability. To maintain consistency with the research and OCTO work, we need downstream RAGAS as a dependency.

RAGAS is becoming the de facto standard for RAG evaluation. By productizing RAGAS within InstructLab, we can provide users with enterprise-grade evaluation capabilities while maintaining consistency with existing workflows across research, OCTO, and other internal works.

This card is for the work to integrate and productize the RAGAS evaluation framework into InstructLab to provide an automated, comprehensive assessment of RAG system performance. This feature will enable InstructLab to evaluate and optimize RAG implementations through standardized metrics and detailed performance insights.

Goals

Primary use cases: internal RAG pipeline and POCs working with RAG + Fine-Tuned LLM
Enable automated evaluation of RAG artifacts using industry-standard RAGAS metrics
Expand existing model evaluation capabilities to include RAG-specific performance analysis
Integrate seamlessly with current InstructLab workflows

Requirements

Implement core RAGAS evaluation metrics:
- Answer Relevancy scoring
- Context Relevancy assessment
- Faithfulness measurement
- Answer Correctness validation
- Context Precision/Recall analysis
Data Processing Requirements:
- It should use the docling / InstructLab ingestion pipeline
Reports Requirements:
- Exportable reports in parquet or similar formats to be used outside
Integration Requirements:
- Documentation and usage examples

Done:

Core RAGAS metrics implemented and validated
Integration tests passing with >75% coverage
User interface components developed and tested
Performance benchmarks established
User documentation
(stretched goal) Tutorials published

Questions to Answer

What is the expected scale of concurrent evaluations?
How will we handle version compatibility with upstream RAGAS?
What level of customization should we allow for evaluation parameters?
How will we store and manage evaluation results?
What is the expected latency for real-time evaluations?

Out of Scope

Custom LLM integration for evaluation
Real-time streaming evaluation
Automated optimization of RAG systems
Integration with third-party evaluation frameworks
Historical evaluation data migration
Custom metric development

Customer Considerations

Performance Impact: Minimize impact on existing system performance
Learning Curve: Ensure a smooth learning curve for existing InstructLab users

blocks

RHELAI-2374 [eval] Extending RAGAS Evaluation Framework with Additional Metrics

Assignee:: William Caban

Reporter:: William Caban

Contributors:: Ilya Kolchinsky, Oleg Silkin

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/11/26 3:15 PM

Updated:: 2025/05/28 10:54 AM

Resolved:: 2025/05/28 10:54 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates