-
Feature Request
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
x86_64
-
-
-
-
-
1. Proposed title of this feature request
OVN monitoring of traffic drop due to networkPolicies
2. What is the nature and description of the request?
Customers with big use of networkPolicy and /or egressFirewall have issues monitoring possible issues of incorrect policies or bugs, like for example, there are various cases where, either due to a OVN issue, either due to the way netpols are generated, traffic is not allowed properly, and the defaultDeny ACLs will drop the traffic. Today this is only noticed by the application or by any customer third-party monitoring tools.
Customer would like to request an improvement in OVN-Kubernetes metrics, to get a global count of drops due to missing ACL. In general, there is no drop of traffic due to netpols, so an increase of rate of this metrics will be a good indicator that something is incorrect. For post mortem analysis, the increase of drops will help us determine the start of the issue. Increase on a specific node, or a global increase on a stack, or on multiple stacks, will help tremendously with the discovery of the issue.
Example metrics:
ovnkube_node_drops_total{acl="ingressDefaultDeny"}
ovnkube_node_drops_total{acl="egressDefaultDeny"}
ovnkube_node_drops_total{acl="egressFirewallDeny"}
3. Why does the customer need this? (List the business requirements here)
Customers with large amount of networkPolicies and egressFirewall can have some trouble to track possible issues due to incorrect rules implemented or bugs with the CNI, which can have high impacts on applications availability.
4. List any affected packages or components.
OCP with OVN-Kubernetes