-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
-
-
Description:
Add a network analysis feature to OpenShift that enables detailed monitoring, troubleshooting, and analysis of internal network traffic between pods and within namespaces, categorized by devices, protocols, and ports. This data should be exported as Prometheus metrics, with the intent of complementing existing network interface metrics provided by the current Observability stack.
Problem Statement:
Telecom operators and Network Equipment Providers (NEP) operating within Kubernetes environments often need detailed visibility into the internal network traffic between pods and namespaces. Current observability features focus on network interface traffic (e.g., aggregate ingress and egress metrics), but they do not provide sufficient insight into how traffic flows within the cluster, categorized by devices, protocols, and ports. This limitation makes it difficult to troubleshoot, optimize, and secure network operations within the cluster.
For instance, operators frequently need to monitor:
- Pod-to-pod communication patterns.
- Protocol usage within namespaces (e.g., UDP, TCP).
- Traffic load distributed across specific devices or interfaces within the Kubernetes network infrastructure.
- Anomalies in port usage (e.g., unexpected traffic on certain ports).
Without these detailed insights, network-related issues within Kubernetes (such as bottlenecks, unexpected traffic, traffic loss…) can be difficult to detect and resolve.
Examples:
- Syscom messages SCTP 49xxx-49yyy
- CNF internal database TCP 21xxx-21yyy...
- Traffic Manager TCP: 3xxxx
- ZMQ TCP....
- PVC ....
- tunnel UDP:xxx
- vCU, vDU, Nokia Operator, and OCP, specific namespace.
- CaaS different internal traffic's monitoring: RX/TX Bytes, RX/TX packet, error, drop,
Proposed Enhancement:
Develop a network analysis feature in Kubernetes that allows for the monitoring, troubleshooting, and analysis of internal network traffic between pods and namespaces, with granularity by devices, protocols, and ports. The metrics can be exposed to Prometheus to support visualization via existing Prometheus/Grafana dashboards.
Specifically, the feature should:
- Track internal network traffic between pods in real-time, distinguishing between different protocols (e.g., TCP, UDP, SCTP, ICMP) and monitoring port-specific usage.
- Provide detailed breakdowns of traffic by:
- Source and destination pods or namespaces.
- Network devices (e.g., virtual interfaces, service meshes).
- Protocols (e.g., HTTP, TCP, DNS).
- Ports (e.g., traffic flowing over specific ports like 443 for HTTPS).
- Export this data as Prometheus metrics, including:
- Gauges (for tracking current states, such as active connections).
- Counters (for tracking traffic volumes over time).
- Histograms (for traffic distribution by protocol or port).
- Possible key metrics: internal_pod_traffic_bytes_total, internal_protocol_traffic_packets_total, pod_port_traffic_total, etc.
- Integrate these metrics into the existing Prometheus and Grafana stack to allow operators to visualize the internal network flows in real-time and generate actionable insights.
Business Justification:
Telecom operators rely on Kubernetes for the deployment and orchestration of cloud-native network functions (CNFs), which are inherently sensitive to networking conditions within the cluster. As network traffic volumes and complexity increase within large-scale environments, the need for detailed network visibility becomes more critical. This proposed feature will address key operational requirements:
- Improved Troubleshooting and Root Cause Analysis:
Operators will be able to identify network issues such as misconfigurations, bottlenecks, and unexpected traffic patterns, enabling faster resolution and improved system uptime. - Enhanced Network Security:
By exposing detailed traffic flows and protocol usage, this feature will allow operators to detect and respond to suspicious or unauthorized traffic within the cluster (e.g., unexpected traffic on a non-standard port). - Capacity Planning and Optimization:
Visibility into how network resources are being utilized by various CNFs will allow telecom operators to better manage capacity, optimize network traffic flows, and improve the overall efficiency of network operations. - Regulatory Compliance:
Many telecom operators face strict regulatory requirements for monitoring and auditing network traffic. This feature will help meet those requirements by providing the necessary level of detail about internal network communications.
This enhancement also complements existing network observability solutions by focusing on traffic flows within the cluster, filling a key gap in Kubernetes' current monitoring capabilities.
Technical Considerations:
- Integration with CNI Plugins:
The network analysis feature should integrate seamlessly with OpenShift CNI (Container Network Interface) plugins, including OVNKubernetes, SR-IOV devices and secondary CNI provided by multus (macvlan, ipvlan…), to capture detailed traffic metrics. - Resource Usage:
Monitoring traffic at this level of granularity could increase resource usage (CPU, memory) within the cluster. Efficient data collection and aggregation methods should be considered to minimize performance impact. RDS characterization will be required to validate supportability of this feature
Priority:
This feature is required for telecom operators and NEPs who require advanced network observability for troubleshooting and ensuring performance and security within their Kubernetes clusters.
Success Criteria:
- Successful development of a network analysis feature that allows for the breakdown of internal network traffic between pods and namespaces by device, protocol, and port.
- Integration with the Prometheus/Grafana observability stack, with detailed dashboards available for network traffic metrics.
- Demonstrated ability to use this feature for troubleshooting network issues and optimizing network performance in a Kubernetes environment.
Requested by:
Telecom Operators & Network Equipment Providers (NEPs)
Target Release:
TBD – Align with other networking and observability features in the roadmap.