-
Feature
-
Resolution: Done
-
Critical
-
None
-
None
-
Strategic Product Work
-
False
-
-
False
-
OCPSTRAT-895Openshift LightSpeed GA
-
23% To Do, 7% In Progress, 70% Done
-
0
-
Program Call
Goal
Evaluate the quality of answers provided by the OpenShift Lightspeed (OLS) AI assistant for product-related questions.
Timeline
August/30/2024
Purpose
The purpose of this feature is to develop a method for evaluating the quality of responses given by OpenShift Lightspeed. We aim to create a "golden set" of questions and answers reviewed by human experts for each product area. This set will serve as a standard of excellence, helping us compare and understand the quality of OLS outputs. This internal OLS feature will be used by the OLS team to assess response quality and formulate a plan for enhancement.
Overview
The OLS team has created a list of synthetically generated questions and answers for each OpenShift product area, referred to as the golden set. Each OCP team will be assigned an Epic to review and correct the list of questions related to their product area.
These golden set of questions and answers (hosted at Q&A DocumentLink) will serve as the baseline for evaluating the answers generated by OLS.
Background
To provide accurate responses, OLS takes user prompts, searches relevant information in OCP documentation (RAG), and then summarizes it using an LLM.
Requirement/Request
We need each product team to:
- Review and correct the questions.
- Review and correct the answers.
- Add additional relevant questions and answers.
Feedback on Received Questions
Since the questions have been synthetically generated by an AI system, we've received feedback that:
- Some questions are not related to the assigned product area. If you encounter such questions, either assign them to the relevant product team by moving the EPIC in their product board or reach out to @Gaurav Singh.
- Some questions may not be valid or might appear awkward. As part of our request, please correct these questions and add more relevant ones.
How We Will Use the "Answer Quality Metrics"
Based on our findings, actions may range from low-effort tasks like updating product information in documentation to high-effort tasks like fine-tuning the model. We will evaluate and prioritize these actions accordingly.