-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
Develop interface for OCP Updates and LLM
-
Product / Portfolio Work
-
-
0% To Do, 0% In Progress, 100% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
-
None
Epic Goal
The purpose of this epic is to develop an interface, called the Model Context Protocol (MCP) tool/server, that enables a Large Language Model (LLM) to interact with OpenShift's update components, including the Cluster Version Operator (CVO), OpenShift Update Service (OSUS), and Cincinnati (Upstream Update Service). This interface will allow the LLM to perform the following tasks:
- Retrieve update recommendations for OpenShift clusters.
2. Check the status of ongoing updates
3. Precheck the cluster for upgrades by analyzing existing conditions.
4. Access miscellaneous update-related information from the cluster.
This new capability will provide customers with a more intelligent, automated, and seamless way to manage updates for their OpenShift clusters, enhancing efficiency and user experience.
Why is this important?
This epic is critical because it introduces AI-driven automation to the OpenShift update process, aligning with the growing demand for intelligent infrastructure management. The key benefits include:
- Improved Automation: Automates routine tasks like retrieving update recommendations and monitoring progress, reducing manual effort.
- Enhanced Decision-Making: The LLM can analyze complex update data and provide actionable insights, helping customers make informed update decisions.
- Reduced Risk: Prechecking clusters with an LLM can identify potential issues before upgrades, minimizing failures and downtime.
- Priority: This positions OpenShift as a leader in AI-enhanced cluster management, meeting customer expectations for modern, self-managing systems.
Scenarios
- Cluster Administrator
- Action: Requests update recommendations for their OpenShift cluster.
- Platform Specifications: OpenShift 4.x cluster running on AWS.
- User Persona: A cluster administrator responsible for maintaining cluster health.
- Details: The administrator uses the LLM to query the MCP tool, which interfaces with Cincinnati and OSUS to recommend updates based on the cluster's current version and configuration.
- DevOps Engineer
- Action: Monitors the status of an ongoing update.
- Platform Specifications: OpenShift 4.x cluster running on-premises.
- User Persona: A DevOps engineer ensuring successful update completion.
- Details: The engineer uses the LLM which in turn uses the MCP tool to check real-time update status, view logs, and receive LLM-generated alerts if issues arise.
- Site Reliability Engineer (SRE)
- Action: Performs prechecks before an upgrade.
- Platform Specifications: OpenShift 4.x cluster running on GCP.
- User Persona: An SRE focused on cluster reliability and minimizing downtime.
- Details: The SRE uses the LLM to call the MCP tool to run prechecks, leveraging the LLM to analyze cluster conditions and confirm upgrade readiness.
- Support Engineer
- Action: Retrieves miscellaneous update-related information for troubleshooting.
- Platform Specifications: OpenShift 4.x cluster running on Cloud/ Baremetal/ etc
- User Persona: A support engineer assisting a customer with an update issue.
- Details: The engineer uses the MCP tool to access logs, metrics, and other data via the LLM to diagnose and resolve the problem.
Dependencies
- Internal Dependencies:
-
- The CVO, OSUS, and Cincinnati teams must provide APIs or interfaces for the MCP tool to interact with these components or develop a standalone MCP tool that interfaces with these components or works along these existing components.
-
- Effective prompts to ensure the model can process and interpret data from the update components.
- External Dependencies:
-
- None identified at this time.
Contributing Teams
- Development: OTA
- Documentation: OTA
- QE: OTA
- PX: —
- Others:
-
- Security team: Reviews security implications of exposing update data to an LLM.
-
- Performance team: Ensures the tool does not impact cluster performance.
Acceptance Criteria
- The LLM can query the MCP tool to retrieve update recommendations for a given cluster.
- The MCP tool can display the current status of an ongoing update.
- The MCP tool can perform prechecks for upgrades and provide a readiness report.
- The MCP tool can retrieve miscellaneous update-related information (e.g., logs, metrics).
- All interactions with update components are secure and do not introduce vulnerabilities.
Drawbacks or Risk
- Complexity: Integrating an LLM with update components may increase development and maintenance complexity.
- Security Concerns: Exposing update data to an LLM could pose security risks if not properly managed.
- Limited Audience: Only customers comfortable with AI tools may adopt this feature, potentially limiting its impact.
- Redundancy: This work could be superseded by other planned update management tools.
Done - Checklist
- CI Testing: Tests are merged and completing successfully.
- Documentation: Content development is complete, including user guides and API documentation.
- QE: Test scenarios are written and executed successfully, covering all use cases.
- Technical Enablement: Slides are complete (if requested by PLM).
- Other:
-
- Security review is conducted and issues addressed.
-
- Performance testing confirms no degradation in cluster performance.
- is cloned by
-
OTA-1703 PoC : Create Rules-Based Routing
-
- Dev Complete
-