-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
-
-
NetObserv - Sprint 257, NetObserv - Sprint 258, NetObserv - Sprint 259, NetObserv - Sprint 260, NetObserv - Sprint 261, NetObserv - Sprint 262
The way metrics are queried differ between Loki and Prometheus when a filter is set on a drops-related label such as drop cause. Generated results are different.
For instance, say we have a query with the filter DROP CAUSE=TCP_INVALID_SEQUENCE, and in query options "Drops filter" set to "All"
- With Loki, the query returns the bytes/packet counters of all flows having some TCP_INVALID_SEQUENCE, including ones that haven't been dropped (unless user explicitly asks to return dropped bytes/packets)
- With Prometheus, the query returns the dropped bytes/packet counters having TCP_INVALID_SEQUENCE.
So we generally see lower values with Prometheus.
—
2nd example: say we have a query with the filter DROP CAUSE!=TCP_INVALID_SEQUENCE (not equals), and in query options "Drops filter" set to "All"
- With Loki, the query returns the bytes/packet counters of all flows having drops, but not caused by TCP_INVALID_SEQUENCE. For some reason, it does not take flows having no drop at all.
- With Prometheus, the query returns the dropped bytes/packet counters not having TCP_INVALID_SEQUENCE. In this example, both Loki and Prometheus look wrong, because they both ignore flows without drops.
—
To help reason about this issue, here's a more concrete example. Imagine we have just these 3 flows:
FLOW 1 {X => Y, pkt: 30, dropPkt: 10, dropCause: CAUSE_A} FLOW 2 {X => Y, pkt: 40, dropPkt: 5, dropCause: CAUSE_B} FLOW 3 {X => Y, pkt: 30, (nodrop)}
Flows contain a mix of dropped and not dropped packets.
Translated into pseudo-metrics, here's what we get:
metric_drops_packets{src=X,dst=Y,cause=CAUSE_A}: 10 metric_drops_packets{src=X,dst=Y,cause=CAUSE_B}: 5 metric_packets{src=X,dst=Y}: 100
Now for different queries, here's expectation vs actual:
Query: Expected: Actual Loki: Actual Prometheus: (no filter) 100 100 100 cause=CAUSE_A 10 30 (or 10)* 10 cause=CAUSE_* 15 70 (or 15)* 15 cause!=CAUSE_A 90 40 (or 5)* 5 cause!=CAUSE_* 85 0 (or 0)* 0 *: when setting explicitly dropBytes/Packets as the metric to fetch in the UI options
- is related to
-
NETOBSERV-1649 Follow-up: manage drop metrics via prom
- Closed
- links to