XML

Word

Printable

Type: Epic
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Epic Name:
Develop interface for OCP Updates and LLM
Epic Status:
To Do
Activity Type:
Product / Portfolio Work
Parent Link:
OCPSTRAT-2241Low friction OpenShift Upgrade Experience with OLS and update-agent
Hierarchy Progress Bar:

0% To Do, 0% In Progress, 100% Done
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Size:
None

Target Version:

openshift-4.21
Release Blocker:
None

Epic Goal

The purpose of this epic is to develop an interface, called the Model Context Protocol (MCP) tool/server, that enables a Large Language Model (LLM) to interact with OpenShift's update components, including the Cluster Version Operator (CVO), OpenShift Update Service (OSUS), and Cincinnati (Upstream Update Service). This interface will allow the LLM to perform the following tasks:

Retrieve update recommendations for OpenShift clusters.

2. Check the status of ongoing updates

3. Precheck the cluster for upgrades by analyzing existing conditions.

4. Access miscellaneous update-related information from the cluster.

This new capability will provide customers with a more intelligent, automated, and seamless way to manage updates for their OpenShift clusters, enhancing efficiency and user experience.

Why is this important?

This epic is critical because it introduces AI-driven automation to the OpenShift update process, aligning with the growing demand for intelligent infrastructure management. The key benefits include:

Improved Automation: Automates routine tasks like retrieving update recommendations and monitoring progress, reducing manual effort.

Enhanced Decision-Making: The LLM can analyze complex update data and provide actionable insights, helping customers make informed update decisions.

Reduced Risk: Prechecking clusters with an LLM can identify potential issues before upgrades, minimizing failures and downtime.

Priority: This positions OpenShift as a leader in AI-enhanced cluster management, meeting customer expectations for modern, self-managing systems.

Scenarios

Cluster Administrator
- Action: Requests update recommendations for their OpenShift cluster.
- Platform Specifications: OpenShift 4.x cluster running on AWS.
- User Persona: A cluster administrator responsible for maintaining cluster health.
- Details: The administrator uses the LLM to query the MCP tool, which interfaces with Cincinnati and OSUS to recommend updates based on the cluster's current version and configuration.
DevOps Engineer
- Action: Monitors the status of an ongoing update.
- Platform Specifications: OpenShift 4.x cluster running on-premises.
- User Persona: A DevOps engineer ensuring successful update completion.
- Details: The engineer uses the LLM which in turn uses the MCP tool to check real-time update status, view logs, and receive LLM-generated alerts if issues arise.
Site Reliability Engineer (SRE)
- Action: Performs prechecks before an upgrade.
- Platform Specifications: OpenShift 4.x cluster running on GCP.
- User Persona: An SRE focused on cluster reliability and minimizing downtime.
- Details: The SRE uses the LLM to call the MCP tool to run prechecks, leveraging the LLM to analyze cluster conditions and confirm upgrade readiness.
Support Engineer
- Action: Retrieves miscellaneous update-related information for troubleshooting.
- Platform Specifications: OpenShift 4.x cluster running on Cloud/ Baremetal/ etc
- User Persona: A support engineer assisting a customer with an update issue.
- Details: The engineer uses the MCP tool to access logs, metrics, and other data via the LLM to diagnose and resolve the problem.

Dependencies

Internal Dependencies:

- The CVO, OSUS, and Cincinnati teams must provide APIs or interfaces for the MCP tool to interact with these components or develop a standalone MCP tool that interfaces with these components or works along these existing components.

- Effective prompts to ensure the model can process and interpret data from the update components.

External Dependencies:

- None identified at this time.

Contributing Teams

Development: OTA

Documentation: OTA

QE: OTA

PX: —

Others:

- Security team: Reviews security implications of exposing update data to an LLM.

- Performance team: Ensures the tool does not impact cluster performance.

Acceptance Criteria

The LLM can query the MCP tool to retrieve update recommendations for a given cluster.

The MCP tool can display the current status of an ongoing update.

The MCP tool can perform prechecks for upgrades and provide a readiness report.

The MCP tool can retrieve miscellaneous update-related information (e.g., logs, metrics).

All interactions with update components are secure and do not introduce vulnerabilities.

Drawbacks or Risk

Complexity: Integrating an LLM with update components may increase development and maintenance complexity.

Security Concerns: Exposing update data to an LLM could pose security risks if not properly managed.

Limited Audience: Only customers comfortable with AI tools may adopt this feature, potentially limiting its impact.

Redundancy: This work could be superseded by other planned update management tools.

Done - Checklist

CI Testing: Tests are merged and completing successfully.

Documentation: Content development is complete, including user guides and API documentation.

QE: Test scenarios are written and executed successfully, covering all use cases.

Technical Enablement: Slides are complete (if requested by PLM).

Other:

- Security review is conducted and issues addressed.

- Performance testing confirms no degradation in cluster performance.

is cloned by

OTA-1703 PoC : Create Rules-Based Routing

Closed

Assignee:: Pratik Mahajan

Reporter:: Pratik Mahajan

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2025/06/06 5:32 PM

Updated:: 2025/11/06 1:14 PM

Resolved:: 2025/08/26 4:54 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Dependencies

Contributing Teams

Acceptance Criteria

Drawbacks or Risk

Done - Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates