-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
OCPSTRAT-895Openshift LightSpeed GA
-
50% To Do, 50% In Progress, 0% Done
-
0
-
Program Call
Background
A high-quality RAG process focuses on three areas of optimization: # Contextualized splitter function
- Embedding techniques and rich metadata
- Retrieval techniques
This Feature card is about the point number 1. The idea is to adopt a splitter function that retains context within chunks. For example, maintain a YAML example in the same chunk. Maintain notes in documents associated with the code block or section they are part of.
Deliverables
- Evaluate the quality of retrievals when using the MarkdownHeaderTextSplitter for creating chunks for embeddings
- Compare to other retrievals:
- Semantic Chunking
- RecursiveCharacterTextSplitter
- CodeTextSplitter
- Compare to other retrievals:
- Evaluate the quality of retrievals when using a custom splitter function for defining chunks for embeddings
- Contextualization of code blocks, lists, tables, notes, sections, and images
- Document with findings on improvements based on the context of the document
- Update text Splitter in RAG embedding pipeline based on findings