-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Model Benchmarking & Baseline Establishment
-
False
-
-
False
-
-
To Do
-
RHDHPLAN-261 - [Lightspeed] Evaluations - testing accuracy and efficacy across models
-
100% To Do, 0% In Progress, 0% Done
-
-
Epic Goal
Run the evaluation suite against various models to establish a baseline accuracy and provide official recommendations
Why is this important?
- …
Scenarios
- Identify Candidate Models: Select the models for testing, ensuring at least one medium/large model (for cluster) and one small model (for local).
- Analyze Results: Collect and analyze the accuracy reports from all model tests.
- Publish Recommendations: Document and publish the official baseline accuracy numbers and the recommended models for both cluster and local use.
Acceptance Criteria (Mandatory)
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- Acceptance criteria are met
- Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
- User Journey automation is delivered
- Support and SRE teams are provided with enough skills to support the feature in production environment
- clones
-
RHIDP-9982 Developer Lightspeed Standard Evaluation Dataset Creation
-
- New
-