Loading...

XML

Word

Printable

Type: Spike
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- model-validation

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

The overall mission for this work item is to do the following comparisons:

granite-starter instructlab trained + RAG vs llama (off-the-shelf, not trained) with no RAG, potentially add in mistral (off the shelf not trained) with no RAG.
granite-starter instructlab trained + RAG vs llama (off-the-shelf, not trained) + RAG vs mistral (off-the-shelf, not trained) + RAG

Tasks include:

Getting access to the documents for as many POC's as possible.
For each of them creating a benchmark data set that's large enough to reliably measure distinctions like the ones requested in this work item.
Standing up a RAG capability for conducting the tests. Note that some of the POC's are heavily focused on tables, so it's important for the capability to be reasonably competent at extracting answers from tables. There are conflicting examples from IBM about how to do that well using Docling, so more investigation is needed in this area.
Measuring how effective the models are on the benchmark data sets.

Steven asked Mo to staff this, and Mo assigned it to me. Since it didn't come through the PMs, no PM made a Jira entry for it. So Mo told me that I was welcome to make one of my own – this is that Jira entry. I'm leaving the priority undefined for now because there hasn't been any clear indication of what the priority is.

Assignee:: Bill Murdock

Reporter:: Bill Murdock

Contributors:: Laura Santamaria

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/03/20 3:41 PM

Updated:: 2026/01/14 4:13 PM

Resolved:: 2026/01/14 4:13 PM

Target start:: 2025/02/28

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates