-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Developer Lightspeed Standard Evaluation Dataset Creation
-
False
-
-
False
-
-
To Do
-
RHDHPLAN-261 - [Lightspeed] Evaluations - testing accuracy and efficacy across models
-
100% To Do, 0% In Progress, 0% Done
-
-
Epic Goal
Develop a comprehensive, standardized Q&A dataset specific to the Developer Lightspeed plugin's knowledge domain, e.g.
- Backstage
- Red Hat Developer Hub (RHDH)
- Kubernetes
- Openshift
- CI/CD
- GitOps
- Pipelines
- Developer Portals
- Deployments
- Software Catalogs
- Software Templates
- Tech Docs
Lightspeed-core dataset: https://gitlab.cee.redhat.com/lightspeed-core/evaluation-data
- …
Scenarios
- Define Dataset Scope: Identify key topics, user personas, and question categories to be covered (e.g., RAG-specific documentation, common RHDH tasks, troubleshooting).
- Source & Write Q&A Pairs: Collaborate with Subject Matter Experts (SMEs), documentation teams, and product managers to generate a robust list of questions and their "golden" or expected answers.
- Format Dataset: Convert the Q&A pairs into the eval_data.yaml format required by the Lightspeed Core evaluation tool.
- Documentation
- Stretch: Provide instructions for the user to customize the data set to help them to evaluate their model in case of BYOK and BYO MCP in the future
Acceptance Criteria (Mandatory)
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- Acceptance criteria are met
- Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
- User Journey automation is delivered
- Support and SRE teams are provided with enough skills to support the feature in production environment
- clones
-
RHIDP-9989 Lightspeed Evaluation Tool Integration & Setup
-
- In Progress
-
- is cloned by
-
RHIDP-9997 Model Benchmarking & Baseline Establishment
-
- New
-