Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3896

Document Gathering: Define Documentation Sources and Ingestion Strategy

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • None
    • Fit and Finish
    • False
    • Hide

      None

      Show
      None
    • False

      Goal
      Begin an exploratory effort to understand where relevant user documentation currently resides and how it can be integrated into our ingestion pipeline. This includes identifying key sources, understanding the ingestion requirements, and mapping out the experience for different personas interacting with documentation.

      Out of scope
      We do not want to get into the practice of document management

      Problem statement
      [Fill in]

      Background
      Our workflows as of today, are dealing with one doc at a time, how do we scale? How do we enable SMEs, Data Scientist, AI Engineers to bring their documents at scale?

      Myriam and Adele, are hearing from customers, interested in SDG, and how that would integrate with the pipeline.

      Who can we connect with to gather insights?

      • Can we work with Myriam and Adel to get access to the customers they have spoken with?
      • Can we leverage docling upstream community? Use a survey for this? Something quick and dirty?
      • Are there any internal teams we can talk to?
        • DDIS (Faisal Shah, DS, DDIS)
        • OpenShift Lightspeed (Anxhela, OLS, Docs)

      Key Questions to address

      • What might it look like if a folder of docs is uploaded? How might that scale to connecting to existing data sources?
      • Where do user docs currently live (internally and externally)?
      • How can they be added to the ingestion pipeline effectively?
      • What would the document ingestion experience look like for various personas?
      • Which personas are involved?
      • Do we use APIs, plug-ins, integrations?
      • How does docling factor in?
      • Vectorized/non-vectorized data
      • Is the docling team going to build plugins to integrate with sources (is this true, what is the timeline, how could we leverage them, what sources are they targeting)?

      Acceptance criteria

      • Work with jepandit@redhat.com on answering "who can we talk to to understand this"
        • Connecting with the RHOAI team might help here somewhat. Sounds like there might be customers interested in bringing/building an ingestion pipeline.
      • Documentation sources identified and categorized
      • Personas involved in the doc lifecycle mapped out
      • High-level experience flow for doc ingestion defined per persona
      • Summary of findings shared with the team

      Resources
      Meeting notes from May 5

      Deliverables
      [Fill in]

      Next steps

      • Figure out how we want to timebox this? How long do we want to keep the survey going?
      • Start with docling community survey and then schedule a few internal conversations with teams highlighted above.

              jingfutan Jingfu Tan
              mehall-1 Megan Hall
              Jehlum Vitasta Pandit
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: