-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
5
-
False
-
-
False
-
-
-
RHDH AI Sprint 3284, RHDH AI Sprint 3285, RHDH AI Sprint 3286
Task
As an engineer working in the "AI Notebooks" feature, I need to make a component where given a url, pdf, doc, docx, txt, md, or json it will extract the document string and pass it on to the document rag chunk generator after safety checking.
For extensions doc, docx, txt, md, and json, the component should clean and delete necessary tokens.
For pdf, the scope will only contain native pdf (not scanned pdf) to convert into doc.
For url, it will be only the specific url page content to be security checked and added to the vector database.
Ensure security and stability
Background
Dependencies and Blockers
QE impacted work
Documentation impacted work
Acceptance Criteria