Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-2350

Integrate Model Context Protocol for Agentic AI-driven OVN-Kubernetes Troubleshooting

XMLWordPrintable

    • Product / Portfolio Work
    • None
    • 0% To Do, 100% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Feature Overview (aka. Goal Summary)  

      Enable the use of Model Context Protocol (MCP) to support Agentic AI capabilities for automated, context-aware troubleshooting of OVN-Kubernetes networking issues in OpenShift.

      Goals (aka. expected user outcomes)

      The MCP will allow an intelligent agent to persist, retrieve, and reason over structured contextual information—including node states, flow rules, pod connectivity, and previous debugging outcomes. This persistent context enables the agent to track problem resolution attempts across sessions, maintain a coherent understanding of cluster state over time, and suggest or even execute diagnostic commands as part of an autonomous or semi-autonomous troubleshooting workflow.

      This feature enhances the supportability of OpenShift networking by enabling AI agents to:

      • Contextualize OVN-K-specific anomalies (e.g., dropped flows, NB/SB DB inconsistencies)
      • Learn from historic cluster issues
      • Persist recommended or executed steps with outcomes
      • Interact with users through guided root cause exploration

      with the goals:

      • Reduce time-to-resolution for common and complex OVN-Kubernetes networking problems
      • Provide guided diagnostics via AI agents informed by persistent and evolving model context
      • Enable scalable troubleshooting automation across large or multi-cluster OpenShift environments

      Requirements (aka. Acceptance Criteria):

      • MCP stores and retrieves OVN-Kubernetes context data reliably across AI sessions
      • Agent can recall relevant historical troubleshooting steps and apply them to current incidents
      • At least 3 predefined troubleshooting scenarios are supported by the AI agent
      • Agent suggestions are validated to reduce false positives and irrelevant actions
      • Documentation is available for extending MCP schemas and training new workflows

       

      Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed.  Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.

      Deployment considerations List applicable specific needs (N/A = not applicable)
      Self-managed, managed, or both  
      Classic (standalone cluster)  
      Hosted control planes  
      Multi node, Compact (three node), or Single node (SNO), or all  
      Connected / Restricted Network  
      Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x)  
      Operator compatibility  
      Backport needed (list applicable versions)  
      UI need (e.g. OpenShift Console, dynamic plugin, OCM)  
      Other (please specify)  

      Deliverables:

      • MCP schema definition tailored for OVN-Kubernetes context (flows, endpoints, OVS states, DB sync states, etc.)
      • Initial Agentic AI workflows for common OVN-K issues (e.g., pod connectivity failures, gateway misconfigurations)
      • Integration of MCP with troubleshooting logs and telemetry inputs (e.g., must-gather, OVN traceflows)
      • User-facing interface for interacting with the AI agent (CLI, web console plugin, or API)

      Use Cases (Optional):

      Include use case diagrams, main success scenarios, alternative flow scenarios.  Initial completion during Refinement status.

      • EgressIPs troubleshooting
      • UDNs troubleshooting
      • BGP troubleshooting

      Questions to Answer (Optional):

      •  

      Out of Scope

      • Real-time packet analysis or active traffic interception
      • Replacing manual network debugging for advanced edge cases outside current agent capabilities

      Background

      Documentation Considerations

      • Documentation must be available for extending MCP schemas and training new workflows

      Interoperability Considerations

      •  

       

              mcurry@redhat.com Marc Curry
              mcurry@redhat.com Marc Curry
              None
              Aniket Bhat, Ben Bennett
              Surya Seetharaman Surya Seetharaman
              None
              Ashley Hardin Ashley Hardin
              Chris Fields Chris Fields
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: