-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Product / Portfolio Work
-
None
-
0% To Do, 100% In Progress, 0% Done
-
False
-
-
False
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Feature Overview (aka. Goal Summary)
Enable the use of Model Context Protocol (MCP) to support Agentic AI capabilities for automated, context-aware troubleshooting of OVN-Kubernetes networking issues in OpenShift.
Goals (aka. expected user outcomes)
The MCP will allow an intelligent agent to persist, retrieve, and reason over structured contextual information—including node states, flow rules, pod connectivity, and previous debugging outcomes. This persistent context enables the agent to track problem resolution attempts across sessions, maintain a coherent understanding of cluster state over time, and suggest or even execute diagnostic commands as part of an autonomous or semi-autonomous troubleshooting workflow.
This feature enhances the supportability of OpenShift networking by enabling AI agents to:
- Contextualize OVN-K-specific anomalies (e.g., dropped flows, NB/SB DB inconsistencies)
- Learn from historic cluster issues
- Persist recommended or executed steps with outcomes
- Interact with users through guided root cause exploration
with the goals:
- Reduce time-to-resolution for common and complex OVN-Kubernetes networking problems
- Provide guided diagnostics via AI agents informed by persistent and evolving model context
- Enable scalable troubleshooting automation across large or multi-cluster OpenShift environments
Requirements (aka. Acceptance Criteria):
- MCP stores and retrieves OVN-Kubernetes context data reliably across AI sessions
- Agent can recall relevant historical troubleshooting steps and apply them to current incidents
- At least 3 predefined troubleshooting scenarios are supported by the AI agent
- Agent suggestions are validated to reduce false positives and irrelevant actions
- Documentation is available for extending MCP schemas and training new workflows
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
Deployment considerations | List applicable specific needs (N/A = not applicable) |
Self-managed, managed, or both | |
Classic (standalone cluster) | |
Hosted control planes | |
Multi node, Compact (three node), or Single node (SNO), or all | |
Connected / Restricted Network | |
Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
Operator compatibility | |
Backport needed (list applicable versions) | |
UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
Other (please specify) |
Deliverables:
- MCP schema definition tailored for OVN-Kubernetes context (flows, endpoints, OVS states, DB sync states, etc.)
- Initial Agentic AI workflows for common OVN-K issues (e.g., pod connectivity failures, gateway misconfigurations)
- Integration of MCP with troubleshooting logs and telemetry inputs (e.g., must-gather, OVN traceflows)
- User-facing interface for interacting with the AI agent (CLI, web console plugin, or API)
Use Cases (Optional):
Include use case diagrams, main success scenarios, alternative flow scenarios. Initial completion during Refinement status.
- EgressIPs troubleshooting
- UDNs troubleshooting
- BGP troubleshooting
Questions to Answer (Optional):
Out of Scope
- Real-time packet analysis or active traffic interception
- Replacing manual network debugging for advanced edge cases outside current agent capabilities
Background
- https://docs.google.com/presentation/d/1glNUCcA8zpNY-ckwLIRXyedyY1WPUW-tkI17Jtjo3sg/edit?slide=id.g36b12eb63d6_0_0#slide=id.g36b12eb63d6_0_0 OCP Networking Team already did some spike work during shift week, we saw promise and hence embarking on this journey.
Documentation Considerations
- Documentation must be available for extending MCP schemas and training new workflows
Interoperability Considerations