-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Model Benchmarking & Baseline Establishment
-
False
-
-
False
-
-
In Progress
-
RHDHPLAN-261 - [Lightspeed] Evaluations - testing accuracy and efficacy across models
-
33% To Do, 33% In Progress, 33% Done
-
-
Epic Goal
Run the evaluation suite against various models to establish a baseline accuracy and provide recommendations
Note: Since we only have limited number models are accessible for comparison, and also the large Q&A set is generated by AI without full manual reviews, the evaluation result and the accuracy number at current stage is only for internal reference.
Why is this important?
- …
Scenarios
- Identify Candidate Models: Select the models for testing, ensuring at least one medium/large model (for cluster) and one small model (for local).
- Analyze Results: Collect and analyze the accuracy reports from all model tests.
- Publish Recommendations:
Document and publish the baseline accuracy numbers internally and the recommended models for both cluster and local use.
Acceptance Criteria (Mandatory)
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- Acceptance criteria are met
- Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
- User Journey automation is delivered
- Support and SRE teams are provided with enough skills to support the feature in production environment
- clones
-
RHIDP-9982 Developer Lightspeed Standard Evaluation Dataset Creation
-
- In Progress
-