Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1545

Expose a counter of BPF drops

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • eBPF
    • None
    • False
    • None
    • False
    • OCPSTRAT-1207 - Improve Network Observability Operator performance with latest eBPF enhancements (bpfman, Tcx hook latest kernel & RHEL9.4)
    • NetObserv - Sprint 251, NetObserv - Sprint 252

      I don't have a reproducer here, this is a purely theoritical issue by looking at the code.

      Our BPF program, roughly, works by trying to update flow maps when new packets are received, and if it fails, send the 1-packet flow to the userspace via ringbuf

      However, as commented here, that doesn't work when the map update failure occurs on an already existing flow: https://github.com/netobserv/netobserv-ebpf-agent/blob/main/bpf/flows.c#L94-L96

      In this case, the packet is just dropped. So it leads to under-estimated metrics (bps, pps ...).

      While we could certainly try to not drop those packets (e.g. by creating an ad-hoc one-packet flow to forward via RB), this may result in increasing the load on agent CPU (which already drops because being too busy) hence perhaps not something desirable.

      What we can do however is to use a global to count drops, and expose this global to the user space, which would add that to the drops prometheus metric. So at least people know when these drops happen, so they can try toi further optimize the agent config to prevent that.

            mmahmoud@redhat.com Mohamed Mahmoud
            jtakvori Joel Takvorian
            Nathan Weinberg Nathan Weinberg
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: