-
Feature
-
Resolution: Done
-
Normal
-
None
-
Product / Portfolio Work
-
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Background
A high-quality RAG process focuses on three areas of optimization: # Contextualized splitter function
- Embedding techniques and rich metadata
- Retrieval techniques
This Feature card is about the point number 1. The idea is to adopt a splitter function that retains context within chunks. For example, maintain a YAML example in the same chunk. Maintain notes in documents associated with the code block or section they are part of.
Deliverables
- Evaluate the quality of retrievals when using the MarkdownHeaderTextSplitter for creating chunks for embeddings
- Compare to other retrievals:
- Semantic Chunking
- RecursiveCharacterTextSplitter
- CodeTextSplitter
- Compare to other retrievals:
- Evaluate the quality of retrievals when using a custom splitter function for defining chunks for embeddings
- Contextualization of code blocks, lists, tables, notes, sections, and images
- Document with findings on improvements based on the context of the document
- Update text Splitter in RAG embedding pipeline based on findings
- links to