-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
MCP NIDS Troubleshooting
-
To Do
-
Product / Portfolio Work
-
-
100% To Do, 0% In Progress, 0% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
Epic Goal
Develop and validate a ModelContext/Protocol-based troubleshooting framework for Network Ingress & DNS (NIDS) via an MCP server. The server will expose read-only tools and diagnostics for network ingress and DNS subsystems, support both live-cluster and offline (must-gather) analysis modes, enable integration with large language models (LLMs) for contextual troubleshooting through MCP-compliant clients, validate the prototype through structured evaluation with unit, e2e, and integration testing (via the gevals project), and expand coverage to include Gateway API and route reconciliation flows.
Why is this important?
Diagnosing failures across Ingress, DNS, and Gateway API components in OpenShift is currently a multi-layered process requiring expertise across Kubernetes resources, operator logs, HAProxy routing, CoreDNS resolution, and node networking. Combining these debugging tools and failure signatures is time-consuming and error-prone. A structured MCP server will streamline NIDS troubleshooting, reducing time-to-diagnosis, improving consistency, and enabling scalable, reproducible workflows.
For more background see:
https://docs.google.com/document/d/19hToi_iUN5wu4LBXdxMu4cPW7hOEFy8LIJ7vGv_DKOw/edit?tab=t.0
Scenarios
- Support and dev/quality engineers use MCP server tooling to diagnose NIDS issues across multiple subsystems.
- Expose read-only diagnostics for network ingress and DNS subsystems for consistent and secure troubleshooting.
- Allow both live-cluster and offline (must-gather) analysis modes to support various operational contexts.
- Integrate LLMs for contextual troubleshooting via MCP-compliant clients, enabling AI-assisted insights.
- Validate the prototype through structured unit, end-to-end, and integration testing (using gvvals project integration).
- Extend the MCP server to include Gateway API and route reconciliation flows.
Acceptance Criteria
- CI pipelines must run successfully with automated tests covering the MCP server functionalities.
- Release technical enablement materials and documentation are provided for stakeholders.
- Structured evaluation of the prototype via unit, e2e, and integration tests is complete.
- Automated tests are merged into the repository.
Dependencies
- Development of read-only diagnostics and live/offline modes is prerequisite for LLM integration.
- Testing and validation depend on completion of core MCP server features.
- Gateway API and route reconciliation extension depends on initial NIDS server implementation.
Previous Work (Optional):
- None.
Open questions:
- What specific logs and metrics should be prioritized for initial MCP diagnostics?
- How will the LLM integration interface with existing MCP clients?
Done Checklist
- CI is running and tests are automated and merged.
- Release enablement: Provide feature enablement presentation/documentation.
- Dev: Upstream code and tests merged; upstream documentation merged; downstream build attached to advisory.
- QE: Test plans in Polarion and automated tests merged.
- DOC: Downstream documentation merged.