Loading...

XML

Word

Printable

Type: Feature
Resolution: Done
Priority: Normal
Fix Version/s: ols-2.0
Affects Version/s: None
Component/s: Lightspeed
Labels:
- OLS

Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-2123OpenShift Lightspeed 2.0
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
None

Target Version:
None
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Priority Data:
None
PX Impact Score:
PX Technical Impact:
None
PX Impact Range:
None
PX Scheduling Request:
None
PX Technical Impact Notes:
None

Intelligence Requested:
Market:

Background
A high-quality RAG process focuses on three areas of optimization: # Contextualized splitter function

Embedding techniques and rich metadata
Retrieval techniques

This Feature card is about the point number 1. The idea is to adopt a splitter function that retains context within chunks. For example, maintain a YAML example in the same chunk. Maintain notes in documents associated with the code block or section they are part of.

Deliverables

Evaluate the quality of retrievals when using the MarkdownHeaderTextSplitter for creating chunks for embeddings
- Compare to other retrievals:
  - Semantic Chunking
  - RecursiveCharacterTextSplitter
  - CodeTextSplitter
Evaluate the quality of retrievals when using a custom splitter function for defining chunks for embeddings
- Contextualization of code blocks, lists, tables, notes, sections, and images
Document with findings on improvements based on the context of the document
Update text Splitter in RAG embedding pipeline based on findings

links to

openshift/lightspeed-rag-content#41: OLS-558: modify rag chunking

openshift/lightspeed-rag-content#197: OLS-601 Improved RAG content splitting

openshift/lightspeed-rag-content#237: OLS-1499 Add script for removing ballast content from OpenShift documentation

openshift/lightspeed-rag-content#289: OLS-1651 Add script for fetching OpenShift documentation in the HTML format

Assignee:: Gaurav Singh

Reporter:: William Caban

Need Info From:: None

Contributors:: None

Architect:: None

QA Contact:: None

Doc Contact:: None

Product Operations Engineering Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/02/18 7:21 AM

Updated:: 2025/09/02 9:16 PM

Resolved:: 2025/06/17 5:45 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates