-
Feature
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
False
-
-
False
-
Not Selected
Feature Overview (mandatory - Complete while in New status)
We want to provide modular pre-processing pipeline(s) that take in input documents and/or qna.yamls and provides a dataset that is ready for SDG consumption.
The notebooks needs to be customizable, easy to understand (with user docs etc), and transparent. This is driven by consistent feedback from the field and prospective customers - we need to provide output and feedback at every possible step.
Possible flow of notebook 1:
Input: user documents (folder)
Output: converted documents (docling document json/folder of converted docs)
Notebook 2:
Input: Domain/use case, user documents
Output: Folder of auto-generated q&a yamls & source docs - formatted appropriately (i.e. per domain)
Alternate nb 3:
Input: Input docs, user-written qna.yamls
Output: Folder of correctly-linted and fixed q&a yamls & source docs - formatted appropriately (i.e. per domain)
Notebook 4:
Input: Folder of qna.yamls and converted documents
Output: Chunked dataset (hybrid chunked called based on use case - RAG, SDG etc)
At this point, this becomes the input to SDG/RAG.
Goals (mandatory - Complete while in New status)
Modularity and usability.
Requirements (mandatory -_ Complete while in Refinement status):
Requirement | Notes | isMVP? |
---|---|---|
1. Clear indication of pre-reqs - what packages need to be installed, versions of packages, python, docling etc | ||
2. Clear, elaborate, simple user documents and code comments |
Questions to Answer {}{}(Initial completion while in Refinement status):
Include a list of refinement / architectural questions that may need to be answered before coding can begin.
- How do we create a folder of documents - qna.yamls need a reference to a git repo? How is the git-elephant in the room handled? Do we expect users to git init after they download the folder of docs?
- Calling the docling serve API and providing users a way to visualize the rendered jsonl - where and how do they 'fix' them in this flow?
- Today, our supported pre-processing tool is docling. There are investigations underway (Jira to be linked) to evaluate how to support third-party tools (like unstructured.io). If this is feasible, there needs to be a component of schema validation of the input dataset prior to SDG hand-off.
Done - Acceptance Criteria (mandatory - Complete while in Refinement status):
Acceptance Criteria articulates and defines the value proposition - what is required to meet the goal and intent of this Feature. The Acceptance Criteria provides a detailed definition of scope and the expected outcomes - from a users point of view
…
<your text here>
Use Cases - i.e. User Experience & Workflow: (Initial completion while in Refinement status):
Include use case diagrams, main success scenarios, alternative flow scenarios.
<your text here>
Out of Scope {}{}(Initial completion while in Refinement status):
High-level list of items or persona’s that are out of scope.
<your text here>
Documentation Considerations {}{}(Initial completion while in Refinement status):
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation..
<your text here>
Background and Strategic Fit (Initial completion while in Refinement status):
Provide any additional context is needed to frame the feature.
<your text here>
Customer Considerations {}{}(Initial completion while in Refinement status):
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.
<your text here>
Team Sign Off (Completion while in Planning status)
- All required Epics (known at the time) are linked to the this Feature
- All required Stories, Tasks (known at the time) for the most immediate Epics have been created and estimated
- Add - Reviewers name, Team Name
- Acceptance == Feature as “Ready” - well understood and scope is clear - Acceptance Criteria (scope) is elaborated, well defined, and understood
- Note: Only set FixVersion/s: on a Feature if the delivery team agrees they have the capacity and have committed that capability for that milestone
Reviewed By | Team Name | Accepted | Notes |
- …
- is depended on by
-
RHELAI-3711 Official InstructLab Reference Notebooks
-
- Closed
-