XMLWordPrintable

    • Strategic Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • OCPSTRAT-895Openshift LightSpeed GA
    • 50% To Do, 25% In Progress, 25% Done
    • 0
    • Program Call

      Background
      A high-quality RAG process focuses on three areas of optimization: # Contextualized splitter function

      1. Embedding techniques and rich metadata
      2. Retrieval techniques 

       
      This Feature card is about the point number 1. The idea is to adopt a splitter function that retains context within chunks. For example, maintain a YAML example in the same chunk. Maintain notes in documents associated with the code block or section they are part of.
       
       Deliverables

      • Evaluate the quality of retrievals when using the MarkdownHeaderTextSplitter for creating chunks for embeddings
        • Compare to other retrievals:
          • Semantic Chunking
          • RecursiveCharacterTextSplitter
          • CodeTextSplitter
      • Evaluate the quality of retrievals when using a custom splitter function for defining chunks for embeddings
        • Contextualization of code blocks, lists, tables, notes, sections, and images
      • Document with findings on improvements based on the context of the document
      • Update text Splitter in RAG embedding pipeline based on findings

              gausingh@redhat.com Gaurav Singh
              wcabanba@redhat.com William Caban
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: