• Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • 1.10.0
    • None
    • AI

      Story

      Run Lightspeed eval tool against Developer Lightspeed 1.9
      the evaluation will run 2-3 large/medium models and 2 small models to compare with
      the evaluation result for each model will be standard, the result will be collected & internally published as part of this issue.

      Background

      With the evaluation framework in 1.9 (https://issues.redhat.com/browse/RHDHPLAN-261), we also want to run the evaluation against Developer Lightspeed 1.9 release.

      Dependencies and Blockers

      https://issues.redhat.com/browse/RHIDP-11530 need to be done, so we have the dataset for running the evaluation

      Acceptance Criteria

      Select the models for testing, run 2-3 medium/large models and 2-3 small models for evaluation.
      3 large/medium models

      Gemini-2.5-pro
      Gpt-oss:120b
      llama4:scout
      2 small models
      Llama3:8b
      Gemini-2.5-flash-lite

      A standard for reports should be created as part of this work.
      Reports should be generated in this standard format & internally published

              Unassigned Unassigned
              yangcao Stephanie Cao
              RHDH AI
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: