Loading...

XML

Word

Printable

Type: Initiative
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-2648AI-assisted installation experience - Phase II
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Size:
None

Target Version:

openshift-4.20
Release Blocker:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Technical Impact Notes:
None

Intelligence Requested:
Market:

Goal

Evaluate the LLM model's performance in assisting users with end-to-end OpenShift deployment on a specified platform and OpenShift related actions/operations.

Benefit Hypothesis (Why):

Identify what works: Track which models and configurations perform best for specific tasks (accuracy, response time, user satisfaction).
Hallucination Detection: Ensuring factual accuracy and minimizing the generation of false information.
Retrieval Relevance: Verifying that our RAG system pulls the most pertinent information to ground the model's response.
Toxicity Detection: Filtering for and eliminating harmful or inappropriate content.
Summarization Performance: Evaluating the coherence, accuracy, and conciseness of summaries.
Code Generation: Checking for the correctness and readability of generated code, include install configs and manifests.
Spot degradation: The evaluation would also elude to when performance drops over time due to data drift or model updates.
Catch edge cases: Document how the chat assistant handles unusual inputs, errors, or boundary conditions
Regulatory requirements: Many industries require documented testing for AI systems (healthcare, finance, etc.)
Audit trails: Provide evidence of due diligence in model selection and validation.
Risk management: Document potential failure modes and mitigation strategies.
Maintain user trust: Provide users documentation so users can make a data driven decision to choose the model and provide some level of confidence prior to deploying the solution.
ROI demonstration: Show RH stakeholders and customers improvement in performance metrics over time.
Resource allocation: Make informed decisions about where to invest development effort.
Competitive advantage: Systematic testing and documentation leads to better products.

Resources

Assisted MCP with local model running on CPU
https://github.com/IBM/ITBench (Abstract)
k8s-bench - a benchmark to evaluate performance of different LLM models on kubernetes related tasks that is part of the kubectl-ai project.
This is likely very similar in methodology as that intended to be used by RHEL Lightspeed to evaluate their quality in relation to RHEL install.
LCORE-56 (Establish a common benchmarks and practices for LLM evaluations)

Responsibilities

Evaluation (Process Quality & Optimization) workstream – see Conversational Installation Experience for OpenShift.

Success Criteria

A delivery pipeline that gives a very specific measurement of the quality of the results of the AI-assisted installation of OpenShift for the model used.

Results

Add results here once the Initiative is started. Recommend discussions & updates once per quarter in bullets.

Assignee:: Asutosh Samal

Reporter:: Ju Lim

Need Info From:: None

Contributors:: Eran Cohen, Erik Jacobs, Jan Zeleny, Ju Lim, Linh Nguyen, Lisa Lyman, Marcos Entenza Garcia, Mark Riggan, Michal Zasepa, Mrunal Patel, Nick Carboni, Oved Ourfali, Ramon Acedo, Rom Freiman, Zane Bitter

Architect:: Eran Cohen

QA Contact:: None

Doc Contact:: None

Product Operations Engineering Contact:: Eric Rich

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/07/07 8:33 PM

Updated:: 2026/01/16 11:30 AM

Details

Description

Goal

Benefit Hypothesis (Why):

Resources

Responsibilities

Success Criteria

Results

Attachments

Easy Agile Planning Poker

Activity

People

Dates