-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Product / Portfolio Work
-
None
-
100% To Do, 0% In Progress, 0% Done
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Feature Overview (aka. Goal Summary)
This feature introduces Model Context Protocol (MCP) integration into OpenShift Networking to enable agentic, AI-driven troubleshooting for Ingress, Gateway API, and DNS-related failures.
By exposing structured, real-time networking context (configuration, topology, policies, and telemetry) through MCP servers, OpenShift enables AI agents to analyze a live cluster's networking state, correlate signals across layers, and guide operators through faster root-cause analysis and remediation.
The goal is to transform OpenShift Ingress and DNS troubleshooting from a manual, log-driven, expert-only process into an assisted, explainable, and repeatable workflow powered by agentic AI, without replacing existing observability tools or requiring Service Mesh adoption.
Goals (aka. expected user outcomes)
- Enable AI agents to reason over OpenShift Ingress, Gateway API, and DNS state using MCP as a standardized context interface
- Reduce mean time to detection (MTTD) and mean time to resolution (MTTR) for networking-related application outages
- Provide explainable, step-by-step troubleshooting guidance instead of opaque “black-box” recommendations
- Leverage existing OpenShift Networking components (Ingress Operator, DNS Operator, Gateway API) rather than introducing parallel systems
- Align with upstream MCP to avoid vendor lock-in and encourage extensibility
Requirements (aka. Acceptance Criteria):
Functional Requirements
- Expose Ingress, Gateway API, and DNS networking context via one or more MCP servers, including:
- Resource configuration (Ingress, Gateway, HTTPRoute, DNS, Services, Endpoints)
- Traffic status and failure signals (health checks, routing status, error codes
- Support read-only MCP access for troubleshooting and diagnostics (no direct mutation)
- Enable correlation across layers (DNS → Ingress/Gateway → Service → Pod)
- Provide RBAC-aware context exposure, ensuring agents only see data permitted to the requesting user
- Integrate cleanly with existing OpenShift AI / agent frameworks (e.g., OpenShift AI, external MCP-compatible agents)
Non-Functional Requirements
- Minimal performance overhead on control plane and data plane components
- Secure, auditable access to MCP endpoints
- No requirement for Service Mesh, LLM deployment, or external SaaS
- Compatible with disconnected and regulated environments
Anyone reviewing this Feature needs to know which deployment configurations that the Feature will apply to (or not) once it's been completed. Describe specific needs (or indicate N/A) for each of the following deployment scenarios. For specific configurations that are out-of-scope for a given release, ensure you provide the OCPSTRAT (for the future to be supported configuration) as well.
| Deployment considerations | List applicable specific needs (N/A = not applicable) |
| Self-managed, managed, or both | |
| Classic (standalone cluster) | |
| Hosted control planes | |
| Multi node, Compact (three node), or Single node (SNO), or all | |
| Connected / Restricted Network | |
| Architectures, e.g. x86_x64, ARM (aarch64), IBM Power (ppc64le), and IBM Z (s390x) | |
| Operator compatibility | |
| Backport needed (list applicable versions) | |
| UI need (e.g. OpenShift Console, dynamic plugin, OCM) | |
| Other (please specify) |
Use Cases:
**TO BE REFINED**
1. Ingress Traffic Failure Diagnosis
Example: An operator asks an AI agent why external traffic is failing for an application.
The agent uses MCP to:
- Inspect Ingress/Gateway configuration
- Validate DNS records and resolution paths
- Check backend service endpoints and readiness
- Identify misconfigurations (TLS mismatch, route conflict, policy block)
- Explain the failure and recommend corrective actions
2. Intermittent DNS Resolution Issues
Example: An application experiences sporadic name resolution failures.
The agent correlates:
- DNS Operator state and logs
- Pod-level DNS configuration
- Network policy impacts
- Node-local DNS cache behavior
and produces a human-readable diagnosis with supporting evidence.
3. Gateway API Route Debugging
Example: A platform engineer investigates why an HTTPRoute is not receiving traffic.
The agent:
- Traverses Gateway → Listener → Route → Service bindings
- Identifies invalid references or unmet constraints
- Explains why the route is rejected or inactive
Out of Scope
Background
Troubleshooting OpenShift networking, and particularly Ingress, Gateway API, and DNS, requires deep domain knowledge and manual correlation across multiple APIs, operators, logs, and metrics. While OpenShift provides strong observability and diagnostics, the cognitive burden is still high, and worsens with cluster growth.
Model Context Protocol (MCP) provides a standardized way to expose structured system context to AI agents, enabling reasoning over live infrastructure state. By integrating MCP into OpenShift Networking, customer and internal-use can enable agentic workflows that understand Kubernetes networking semantics natively, rather than relying on less sophisticated tooling or procedure.
Customer Considerations
- Security & Trust:
Customers must retain full control over what data is exposed to AI agents. MCP endpoints must respect OpenShift RBAC, tenancy, and audit requirements. - Explainability:
Customers expect AI-assisted troubleshooting to be transparent and actionable, not a “magic answer.” Clear reasoning paths and evidence are critical. - Operational Simplicity:
The feature should work out-of-the-box with existing OpenShift installations, without requiring customers to deploy or manage LLM infrastructure. - Regulated & Disconnected Environments:
The solution must function in air-gapped, on-prem, and regulated environments, with no dependency on external AI services. - Incremental Adoption:
Customers should be able to adopt MCP-based troubleshooting alongside existing tools (must-gather, logs, metrics) rather than replacing them.
Documentation Considerations
Interoperability Considerations
- is related to
-
NE-2176 [SPIKE] Simple MCP Server for Ingress troubleshooting
-
- Review
-
-
NE-2277 Build MCP server exposing read-only tools and diagnostics for Network Ingress and DNS subsystems
-
- Closed
-
-
OCPSTRAT-2350 Integrate Model Context Protocol for Agentic AI-driven OVN-Kubernetes Troubleshooting
-
- In Progress
-
- links to