Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Monitoring, Network - Observability
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Improved network monitoring

2. What is the nature and description of the request?

Currently, OCP has no alerting rules for network issues, such as large packet drops, from nics/softnet/ovs. large spikes in multicast/broadcast traffic (Floods).

# RX drop %
rate(node_network_receive_drop_total[2m] ) / rate(node_network_receive_packets_total[2m]) > 0.05
# RX drop %
rate(node_network_transmit_drop_total[2m] ) / rate(node_network_receive_packets_total[2m]) > 0.05
# softnet drop %
rate(node_softnet_dropped_total[2m] ) / rate(node_softnet_processed_total[2m]) > 0.05

# unexpected protocol/packets
rate(node_network_receive_nohandler_total[2m]) / rate(node_network_receive_packets_total[2m]) > 0.01

# multicast flood (not sure if a threshold on multicast would be appropriate) 
rate(node_network_receive_multicast_total [2m] ) > 100k (flood). 
#prehaps if we get more broadcast/multicast %90 but this will trigger on idle links (arps etc).
rate(node_network_receive_multicast_total [2m] ) > rate(node_network_receive_packets_total[2m]) > 0.9

The percentage could change the severity of the issue, similar to the storage alerts where a the percentage of free space dictates the severity of the issue.

https://github.com/prometheus-community/helm-charts/blob/211245fa1929d5ee581696305087ac551cafdcef/charts/kube-prometheus-stack/templates/prometheus/rules-1.14/node-exporter.yaml#L300C1-L300C36

3. Why does the customer need this? (List the business requirements here)

Customers do not currently get notified when network interfaces are saturated for sustained periods.
This leads to outages and connectivity issues (Especially when a large portion of tx/rx drops are occurring)

4. List any affected packages or components.

monitoring manifests

is related to

NETOBSERV-2393 [GA] Ease of alerting and network health display for Network Observability

NETOBSERV-2356 Extend health to non-netobserv metrics

To Do

Assignee:: Roger Florén

Reporter:: Tim Dawson

Need Info From:: None

Votes:: 1 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/08/12 1:43 AM

Updated:: 2025/09/16 1:50 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates