[NETOBSERV-1430] Some flows are not seen as metrics (both prom and loki) - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: netobserv-1.5
Affects Version/s: netobserv-1.4
Component/s: Kafka
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Release Note Text:

Hide
Previously, with some specific cluster configuration, it happened that the eBPF Agent wasn't able to figure out on which node it was running, resulting in cascading consequences that ended up in failing to provide some of the traffic metrics.
Now the agent's node IP is safely provided by the Operator, inferred from the pod status, which restores those missing metrics.

Show
Previously, with some specific cluster configuration, it happened that the eBPF Agent wasn't able to figure out on which node it was running, resulting in cascading consequences that ended up in failing to provide some of the traffic metrics. Now the agent's node IP is safely provided by the Operator, inferred from the pod status, which restores those missing metrics.
Intelligence Requested:
Market:

Sprint:
NetObserv - Sprint 246, NetObserv - Sprint 247

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

I have setup a sample app [1] that involves microservices communicating via Kafka. One of the service receives events from Kafka. In netobserv, in the flow logs table I'm seeing this in and out traffic. But in the topology, overview and even in the prometheus metrics, I'm only seeing traffic from the service to kafka and not the other way around.

Steps to Reproduce:

Not sure yet how reproducible this is, however this was seen using this sample app: https://github.com/ia3andy/one-two-three-quarkus/

[EDIT after more investigation]

The bug is actually tied to a specific cluster configuration which I got by using a demo cluster on http://demo.redhat.com
I don't know how to reproduce a similar configuration on a more typical cluster (cluster-bot etc.). But I noticed a bunch of differences in netboserv when using that cluster compared to a standard one - for instance, traffic from routes where being seen as coming from nodes, without involving ingress router.
Note that the CNI was still OVN.
Anyway: with that config, the "AgentIP" in our flows is like "192.168.12.2" instead of using the machine network 10.10.10.0/24, which is the root cause of the bug seen.

Actual results:

cf screen captures

Expected results:

in/out Traffic seen in topology & overview & dashboards

PS : I haven't checked if it also affects 1.4

See also: two samples of flow json that are visible in flow table but not as metrics: sample.txt

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Capture d’écran du 2023-12-08 14-29-00.png
175 kB
2023/12/08 1:38 PM
Capture d’écran du 2023-12-08 14-30-05.png
83 kB
2023/12/08 1:38 PM
Capture d’écran du 2023-12-08 14-30-09.png
67 kB
2023/12/08 1:38 PM
Capture d’écran du 2023-12-08 14-38-26.png
120 kB
2023/12/08 1:38 PM
image.png
126 kB
2024/01/08 3:39 PM
image-2024-01-11-10-42-17-984.png
146 kB
2024/01/11 3:42 PM
image-2024-01-11-10-43-06-284.png
146 kB
2024/01/11 3:43 PM
sample.txt
2 kB
2023/12/08 1:44 PM

links to

netobserv/network-observability-operator#512: NETOBSERV-1430: make operator provide agent IP

openshift/openshift-docs#70738: Network Observability 1.5 release notes

RHSA-2023:121076 Network Observability 1.5.0 for OpenShift

mentioned on

Merge request - Updated 3 upstream sources

Merge request - Updated 4 upstream sources

(1 mentioned on)

Assignee:: Joel Takvorian

Reporter:: Joel Takvorian

QA Contact:: Nathan Weinberg

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/12/08 1:38 PM

Updated:: 2024/02/21 1:32 PM

Resolved:: 2024/01/12 10:24 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide