Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2397

[eval] Downstream RAGAS as RAG Evaluation framework

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • 50% To Do, 50% In Progress, 0% Done

      Feature Overview (mandatory - Complete while in New status)

      As part of the RAG artifacts in InstructLab, we need a RAG evaluation framework to measure the performance and quality of the RAG pipelines built for the capability. To maintain consistency with the research and OCTO work, we need downstream RAGAS as a dependency.

      RAGAS is becoming the de facto standard for RAG evaluation. By productizing RAGAS within InstructLab, we can provide users with enterprise-grade evaluation capabilities while maintaining consistency with existing workflows across research, OCTO, and other internal works.

      This card is for the work to integrate and productize the RAGAS evaluation framework into InstructLab to provide an automated, comprehensive assessment of RAG system performance. This feature will enable InstructLab to evaluate and optimize RAG implementations through standardized metrics and detailed performance insights.

       

      Goals

      • Primary use cases: internal RAG pipeline and POCs working with RAG + Fine-Tuned LLM
      • Enable automated evaluation of RAG artifacts using industry-standard RAGAS metrics
      • Expand existing model evaluation capabilities to include RAG-specific performance analysis
      • Integrate seamlessly with current InstructLab workflows

       

      Requirements

      • Implement core RAGAS evaluation metrics:
        • Answer Relevancy scoring
        • Context Relevancy assessment
        • Faithfulness measurement
        • Answer Correctness validation
        • Context Precision/Recall analysis
      • Data Processing Requirements:
        • It should use the docling / InstructLab ingestion pipeline
      • Reports Requirements:
        • Exportable reports in parquet or similar formats to be used outside
      • Integration Requirements:
        • Documentation and usage examples

       

      Done:

      • Core RAGAS metrics implemented and validated
      • Integration tests passing with >75% coverage
      • User interface components developed and tested
      • Performance benchmarks established
      • User documentation
      • (stretched goal) Tutorials published

       

      Questions to Answer

      1. What is the expected scale of concurrent evaluations?
      2. How will we handle version compatibility with upstream RAGAS?
      3. What level of customization should we allow for evaluation parameters?
      4. How will we store and manage evaluation results?
      5. What is the expected latency for real-time evaluations?

       

      Out of Scope

      • Custom LLM integration for evaluation
      • Real-time streaming evaluation
      • Automated optimization of RAG systems
      • Integration with third-party evaluation frameworks
      • Historical evaluation data migration
      • Custom metric development

       

      Customer Considerations

      • Performance Impact: Minimize impact on existing system performance
      • Learning Curve: Ensure a smooth learning curve for existing InstructLab users

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Ilya Kolchinsky, Oleg Silkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: