-
Story
-
Resolution: Unresolved
-
Major
-
None
-
8
-
False
-
-
False
-
-
Story
Run Lightspeed eval tool against Developer Lightspeed 1.9
the evaluation will run 2-3 large/medium models and 2 small models to compare with
the evaluation result for each model will be standard, the result will be collected & internally published as part of this issue.
Background
With the evaluation framework in 1.9 (https://issues.redhat.com/browse/RHDHPLAN-261), we also want to run the evaluation against Developer Lightspeed 1.9 release.
Dependencies and Blockers
https://issues.redhat.com/browse/RHIDP-11530 need to be done, so we have the dataset for running the evaluation
Acceptance Criteria
Select the models for testing, run 2-3 medium/large models and 2-3 small models for evaluation.
3 large/medium models
Gemini-2.5-pro
Gpt-oss:120b
llama4:scout
2 small models
Llama3:8b
Gemini-2.5-flash-lite
A standard for reports should be created as part of this work.
Reports should be generated in this standard format & internally published