-
Initiative
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Product / Portfolio Work
-
None
-
0% To Do, 0% In Progress, 100% Done
-
False
-
-
False
-
None
-
None
-
None
Feature Overview (aka. Goal Summary)
An elevator pitch (value statement) that describes the Feature in a clear, concise way. Complete during New status.
[This feature is a placeholder for ai-agent for upgrades]
Proposal : https://docs.google.com/document/d/1a85aQ2HERcLGc0S_Z2cc-4yQumVf2SIBJrVk3BjmxFM/edit?tab=t.0#heading=h.s0hk586bp0a1
This features is an exploration effort to create tools/MCP server for AI driven OpenShift Update experience.
Note:The current outcome of this JIRA is a prototype rather than a GA quality product feature. Which could take much more time.
There are many assumptions as of today(June-5-2025) due to many missing pieces that are not available.The assumptions: that the MCP server for components or a single internal one exists, maybe even a status/precheck API or status/precheck functions in MCP.
Goals (aka. expected user outcomes)
The observable functionality that the user now has as a result of receiving this feature. Include the anticipated primary user type/persona and which existing features, if any, will be expanded. Complete during New status.
<your text here>
- The upgrade-agent can leverage existing MCP server/s or OpenShift update, precheck, or status commands, or status/recommend API, or components status to drive Upgrades.
- upgrade-agent evaluates live cluster context (topology, operator catalog, workload health, compliance rules) and provides tailored upgrade paths.
- upgrade-agent has permissions on admin for now.
- agent creates a report before starting upgrade like precheck. User can Receive this report. This can be a collection of operators/CRDs/nodes etc.
Risk complexity maybe also computed based on configuration changes history. - Agent Select the safest path and start updates, Predicts upgrade duration
- Agent Creates a report after upgrade of components upgraded and not upgraded
- Agent can decide to cordon nodes, drain nodes or change pdbs? Try to do self-healing functions.
- Agent interacts with MCP server to get cluster context and make actions.
- MCP Server may exposes status or precheck or upgradecontext or recommendations etc
Requirements (aka. Acceptance Criteria):
A list of specific needs or objectives that a feature must deliver in order to be considered complete. Be sure to include nonfunctional requirements such as security, reliability, performance, maintainability, scalability, usability, etc. Initial completion during Refinement status.
Note: All below are not required for PoC. A bare minimum to show feasibility of the agent to do upgrades.
Acceptance criteria:
- Upgrade is successful.
- In case the upgrade is problematic, the agent is able to pinpoint the component failure.
- Agent should stay alive even after upgrade to capture component failures
- Agent monitors components
….
- oc-upgrade-agent cli to interact with agent
- oc-upgrade-agent cli to allow Users to give prompts and query details of the problematic upgrade.(could leverage OLS instead of cli)
- oc-upgrade-agent “tell me the problematic nodes”
- oc-upgrade-agent “tell me the problematic issue XXXX details from documentation” Query existing documentation via MCP.
- oc-upgrade-agent “find issues with cluster before upgrade”
- Agent can create maintenance upgrade window plan for scheduling update later when less workload seen in cluster
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
<your text here>
Questions to Answer (Optional):
Include a list of refinement / architectural questions that may need to be answered before coding can begin. Initial completion during Refinement status.
Out of Scope
High-level list of items that are out of scope. Initial completion during Refinement status.
- Take the simplest upgrade initially and iterate towards complex(ex disconnected older openshift versions or
- Single hop(n+1) upgrades first then explore double hop(n+2) upgrades
- Out of scope all complicated cluster setup for PoC
Background
Provide any additional context is needed to frame the feature. Initial completion during Refinement status.
Customer Considerations
Provide any additional customer-specific considerations that must be made when designing and delivering the Feature. Initial completion during Refinement status.
<your text here>
Documentation Considerations
Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation. Initial completion during Refinement status.
<your text here>
Interoperability Considerations
Which other projects, including ROSA/OSD/ARO, and versions in our portfolio does this feature impact? What interoperability test scenarios should be factored by the layered products? Initial completion during Refinement status.
<your text here>
Post Completion Review - Actual Results{}
After completing the work (as determined by the "when" in Expected Results above), list the actual results observed / measured during Post Completion review(s).
- is related to
-
ACM-15495 Provide automatic lifecycle updates for OCP managed clusters and operators
-
- Backlog
-
- links to