Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1268

Wrong counters reported for large volume downloads

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • netobserv-1.4
    • netobserv-1.4-candidate
    • eBPF
    • None
    • False
    • None
    • False
    • NetObserv - Sprint 241

      When downloading / uploading large files, the generated flows only report for a fraction of the expected byte counters (and perhaps packets as well).

      The issue was first mentioned here, but after some investigation I don't think this is related to reinterpret-direction because I see the same issue with Julien's PR that removes reinterpret-direction.

      To try locate the bug, I'm looking at the raw flows stored in loki rather than the computed metrics: the raw flows themselves are wrong, ie. they always show a value smaller to what I'd expect after downloading large volumes.

      Example: capture in Loki after downloading a 500MB image (sampling is 1):

      It shows only ~100MB while I would expect 500MB. I see the same issue keeping or removing reinterpret-direction.
      My gut feeling would be something wrong in the ebpf agent aggregation logic, to be investigated.

      To reproduce, deploy netobserv / set sampling to 1, then just do a curl or wget of a large file from a pod (e.g. a linux image).
      Then check in grafana / loki for raw flows (from operator: "make deploy-grafana") with a query like:

      {DstK8S_Namespace="<pod's namespace>",DstK8S_OwnerName="<pod's owner>",SrcK8S_Namespace=""}

              mmahmoud@redhat.com Mohamed Mahmoud
              jtakvori Joel Takvorian
              Amogh Rameshappa Devapura Amogh Rameshappa Devapura
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: