Loading...

XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Undefined
Fix Version/s: 1.9.0
Affects Version/s: None
Component/s: model-catalog, Release
Labels:
None

Epic Name:
Model Benchmarking & Baseline Establishment
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Parent Link:
RHDHPLAN-261[Lightspeed] Evaluations - testing accuracy and efficacy across models
Epic Status:
In Progress
Feature Link:
RHDHPLAN-261 - [Lightspeed] Evaluations - testing accuracy and efficacy across models
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Epic Goal

Run the evaluation suite against various models to establish a baseline accuracy and provide recommendations

Note: Since we only have limited number models are accessible for comparison, and also the large Q&A set is generated by AI without full manual reviews, the evaluation result and the accuracy number at current stage is only for internal reference.

Why is this important?

Scenarios

Identify Candidate Models: Select the models for testing, ensuring at least one medium/large model (for cluster) and one small model (for local).

Analyze Results: Collect and analyze the accuracy reports from all model tests.

Publish Recommendations:

Document and publish the baseline accuracy numbers internally and the recommended models for both cluster and local use.

Acceptance Criteria (Mandatory)

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Acceptance criteria are met
Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
User Journey automation is delivered
Support and SRE teams are provided with enough skills to support the feature in production environment

clones

RHIDP-9982 Developer Lightspeed Standard Evaluation Dataset Creation

Closed

Assignee:: Stephanie Cao

Reporter:: Stephanie Cao

Team:: RHDH AI

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/10/22 1:38 AM

Updated:: 2026/01/19 5:38 PM

Resolved:: 2026/01/19 5:38 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria (Mandatory)

Dependencies (internal and external)

Previous Work (Optional):

Open questions::

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates