Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: eBPF
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
ebpf-performance-1.6
Feature Link:
OCPSTRAT-1207 - Improve Network Observability Operator performance with latest eBPF enhancements (bpfman, Tcx hook latest kernel & RHEL9.4)
Intelligence Requested:
Market:

Sprint:
NetObserv - Sprint 251, NetObserv - Sprint 252

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

I don't have a reproducer here, this is a purely theoritical issue by looking at the code.

Our BPF program, roughly, works by trying to update flow maps when new packets are received, and if it fails, send the 1-packet flow to the userspace via ringbuf

However, as commented here, that doesn't work when the map update failure occurs on an already existing flow: https://github.com/netobserv/netobserv-ebpf-agent/blob/main/bpf/flows.c#L94-L96

In this case, the packet is just dropped. So it leads to under-estimated metrics (bps, pps ...).

While we could certainly try to not drop those packets (e.g. by creating an ad-hoc one-packet flow to forward via RB), this may result in increasing the load on agent CPU (which already drops because being too busy) hence perhaps not something desirable.

What we can do however is to use a global to count drops, and expose this global to the user space, which would add that to the drops prometheus metric. So at least people know when these drops happen, so they can try toi further optimize the agent config to prevent that.

links to

netobserv/netobserv-ebpf-agent#304: NETOBSERV-1545: Expose a counter for BPF hashmap update packets drop

Assignee:: Mohamed Mahmoud

Reporter:: Joel Takvorian

QA Contact:: Nathan Weinberg

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/02/29 8:59 AM

Updated:: 2024/04/29 3:47 PM

Resolved:: 2024/04/08 3:39 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates