[NETOBSERV-1740] Discrepancies with drop queries between Loki & Prometheus

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Loki
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
NetObserv - Sprint 257, NetObserv - Sprint 258, NetObserv - Sprint 259, NetObserv - Sprint 260, NetObserv - Sprint 261, NetObserv - Sprint 262, NetObserv - Sprint 263, NetObserv - Sprint 264, NetObserv - Sprint 265, NetObserv - Sprint 266, NetObserv - Sprint 267, NetObserv - Sprint 268

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

The way metrics are queried differ between Loki and Prometheus when a filter is set on a drops-related label such as drop cause. Generated results are different.

For instance, say we have a query with the filter DROP CAUSE=TCP_INVALID_SEQUENCE, and in query options "Drops filter" set to "All"

With Loki, the query returns the bytes/packet counters of all flows having some TCP_INVALID_SEQUENCE, including ones that haven't been dropped (unless user explicitly asks to return dropped bytes/packets)

With Prometheus, the query returns the dropped bytes/packet counters having TCP_INVALID_SEQUENCE.

So we generally see lower values with Prometheus.

—
2nd example: say we have a query with the filter DROP CAUSE!=TCP_INVALID_SEQUENCE (not equals), and in query options "Drops filter" set to "All"

With Loki, the query returns the bytes/packet counters of all flows having drops, but not caused by TCP_INVALID_SEQUENCE. For some reason, it does not take flows having no drop at all.
With Prometheus, the query returns the dropped bytes/packet counters not having TCP_INVALID_SEQUENCE. In this example, both Loki and Prometheus look wrong, because they both ignore flows without drops.

—

To help reason about this issue, here's a more concrete example. Imagine we have just these 3 flows:

FLOW 1 {X => Y, pkt: 30, dropPkt: 10, dropCause: CAUSE_A}
FLOW 2 {X => Y, pkt: 40, dropPkt: 5, dropCause: CAUSE_B}
FLOW 3 {X => Y, pkt: 30, (nodrop)}

Flows contain a mix of dropped and not dropped packets.

Translated into pseudo-metrics, here's what we get:

metric_drops_packets{src=X,dst=Y,cause=CAUSE_A}: 10
metric_drops_packets{src=X,dst=Y,cause=CAUSE_B}: 5
metric_packets{src=X,dst=Y}: 100

Now for different queries, here's expectation vs actual:

Query:            Expected:     Actual Loki:    Actual Prometheus:
(no filter)       100           100             100
cause=CAUSE_A     10            30 (or 10)*     10
cause=CAUSE_*     15            70 (or 15)*     15
cause!=CAUSE_A    90            40 (or 5)*      5
cause!=CAUSE_*    85            0 (or 0)*       0

*: when setting explicitly dropBytes/Packets as the metric to fetch in the UI options

is related to

NETOBSERV-1649 Follow-up: manage drop metrics via prom

Closed

links to

netobserv/network-observability-console-plugin#549: NETOBSERV-1649: Improve UX and cases managed with prometheus

There are no comments yet on this issue.

Assignee:: Unassigned

Reporter:: Joel Takvorian

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/07/01 12:52 PM

Updated:: 2025/03/10 3:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates