-
Bug
-
Resolution: Done
-
Major
-
netobserv-1.5-candidate
-
False
-
None
-
False
-
-
-
NetObserv - Sprint 248, NetObserv - Sprint 249, NetObserv - Sprint 250, NetObserv - Sprint 251, NetObserv - Sprint 252
-
Important
Background
While finding the solution to NETOBSERV-1458 on Jan 26 I discovered that the eBPF memory usage for the 120 node cluster-density-v2 test was 94% higher than the baseline run from the 1.4.2 test which was run on Nov 21.
While cluster-density-v2 had not been run since then we have been doing weekly runs of the 25 node node-density-heavy test. The last successful baseline of this particular test was Jan 8 - subsequent runs showed stable memory on Jan 15 and an increase on Jan 22 but this was fixed by mmahmoud@redhat.com the following day - as such we had no indicator at this smaller scale of any memory increase of this severity.
Attempted Solutions Thus Far
Add flag to PacketDrop to account for RHEL9.3 behavior | netobserv-ebpf-agent/pull/258 | Saw 109.57% increase over 1.4.2 |
Removed some Loki labels | network-observability-operator/pull/552 | Saw 125.28% increase over 1.4.2 but also 61.70% more flows were processed so the number is less severe than it looks |
Reduce maxGlobalStreamsPerTenant from 200000 to 150000 | N/A | Saw 88.98% increase over 1.4.2 |
Reduce overall number of Loki streams | network-observability-operator/pull/554 | Saw 102.56% increase over 1.4.2 but also 71.37% more flows were processed so the number is less severe than it looks |
Rerun of above test at request of mmahmoud@redhat.com | network-observability-operator/pull/554 | Saw 99.06% increase over 1.4.2 but also 69.55% more flows were processed so the number is less severe than it looks |
PR image + increased eBPF pod memory limit from 800Mi default to 2000Mi | network-observability-operator/pull/554 | Saw 107.47% increase over 1.4.2 but also 101.23% more flows were processed |
Run with default settings using bundle 104 | N/A | Saw 107.35% increase over 1.4.2, but -1.50% less flows |
Reran 1.4.2 | N/A | Saw 0.79% increase over original baseline with 1.64% more flows, essentially the same (did have issues with FLP auth with Loki so no flows were written) |
Run with default settings using bundle 107 | N/A | Saw 130.67% increase over 1.4.2, but 0.17% more flows |
Bundle 107 rerun with pprof | N/A | Saw 103.52% increase over 1.4.2, but -0.04% less flows |
Lower default kafkaBatchSize, eBPF memory limit of 1600Mi | network-observability-operator/pull/566 | Saw -24.34% decrease in eBPF memory usage but 133.54% increase in Loki memory usage |
Lower default kafkaBatchSize, eBPF memory limit of 800Mi | network-observability-operator/pull/566 | Saw -25.61% decrease in eBPF memory usage but 155.27% increase in Loki memory usage |
- is related to
-
NETOBSERV-590 Improve eBPF agent performance 1.5
- Closed
-
NETOBSERV-1091 netobserv-ebpf-agent performance degradation between 1.3 and 1.2
- Closed
-
NETOBSERV-1107 Improve ebpf agent memory usage
- Closed
- relates to
-
NETOBSERV-1468 Add monthly NetObserv Perf runs for larger-scale scenarios
- Closed
- split from
-
NETOBSERV-1330 Run performance tests for 1.5 release
- Closed
- links to