XMLWordPrintable

    • BU Product Work
    • False
    • Hide

      None

      Show
      None
    • False
    • OCPSTRAT-895Openshift LightSpeed GA
    • 50% To Do, 50% In Progress, 0% Done
    • 0
    • Program Call

      Background
      A high-quality RAG process focuses on three areas of optimization: # Contextualized splitter function

      1. Embedding techniques and rich metadata
      2. Retrieval techniques 

       
      This Feature card is about the point number 1. The idea is to adopt a splitter function that retains context within chunks. For example, maintain a YAML example in the same chunk. Maintain notes in documents associated with the code block or section they are part of.
       
       Deliverables

      • Evaluate the quality of retrievals when using the MarkdownHeaderTextSplitter for creating chunks for embeddings
        • Compare to other retrievals:
          • Semantic Chunking
          • RecursiveCharacterTextSplitter
          • CodeTextSplitter
      • Evaluate the quality of retrievals when using a custom splitter function for defining chunks for embeddings
        • Contextualization of code blocks, lists, tables, notes, sections, and images
      • Document with findings on improvements based on the context of the document
      • Update text Splitter in RAG embedding pipeline based on findings

            gausingh@redhat.com Gaurav Singh
            wcabanba@redhat.com William Caban
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: