Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3695

[ilab] Investigate ilab-trained Granite + RAG vs off-the-shelf models

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      The overall mission for this work item is to do the following comparisons:

      • granite-starter instructlab trained + RAG vs llama (off-the-shelf, not trained) with no RAG, potentially add in mistral (off the shelf not trained) with no RAG.
      • granite-starter instructlab trained + RAG vs llama (off-the-shelf, not trained) + RAG vs mistral (off-the-shelf, not trained) + RAG

       

      Tasks include:

      • Getting access to the documents for as many POC's as possible.
      • For each of them creating a benchmark data set that's large enough to reliably measure distinctions like the ones requested in this work item.
      • Standing up a RAG capability for conducting the tests.  Note that some of the POC's are heavily focused on tables, so it's important for the capability to be reasonably competent at extracting answers from tables.  There are conflicting examples from IBM about how to do that well using Docling, so more investigation is needed in this area.
      • Measuring how effective the models are on the benchmark data sets.

       

      Steven asked Mo to staff this, and Mo assigned it to me.  Since it didn't come through the PMs, no PM made a Jira entry for it.  So Mo told me that I was welcome to make one of my own – this is that Jira entry.  I'm leaving the priority undefined for now because there hasn't been any clear indication of what the priority is.

              rh-ee-bmurdock Bill Murdock
              rh-ee-bmurdock Bill Murdock
              Laura Santamaria
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: