eBPF has been identified as a promising alternative data source for network metrics, in addition to our current work around netflows/IPFIX.
It could provide more metrics than what we get from netflow/IPFIX, including DNS related ones (success rate, etc.) and HTTP ones (errors, latency...). We can also expect a much reduced overhead compared to flow logs.
This R&D task is about doing more investigations around eBPF; A non exhaustive list:
- look for potential reuse of existing eBPF metrics collectors (e.g. Flowmill [1] or Pixie's Stirling [2] or Cilium)
- identify dependencies / pre-requisite (e.g. what is necessary to enable eBPF on nodes, running programs & exporting metrics)
- figure out if the chosen infrastructure for netflows / ipfix is also relevant for that source of data (e.g. is Loki relevant?)
- eventually initiate some PoC (or via new subtasks)
[1] https://github.com/Flowmill/flowmill-collector ; Contribution proposal to CNCF: https://github.com/open-telemetry/community/issues/733
[2] https://github.com/pixie-labs/pixie/tree/main/src/stirling
Other resources:
- "How To Add eBPF Observability To Your Product" https://brendangregg.com/blog/2021-07-03/how-to-add-bpf-observability.html
- is related to
-
NETOBSERV-54 Spike: eBPF-based flow extractor
- Closed