Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: netobserv-1.5
Affects Version/s: netobserv-1.5-candidate
Component/s: eBPF
Labels:
- perfscale_120_nodes

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important

Target Version:
None
Release Blocker:
None
Sprint:
NetObserv - Sprint 248, NetObserv - Sprint 249, NetObserv - Sprint 250, NetObserv - Sprint 251, NetObserv - Sprint 252

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Background

While finding the solution to ~~NETOBSERV-1458~~ on Jan 26 I discovered that the eBPF memory usage for the 120 node cluster-density-v2 test was 94% higher than the baseline run from the 1.4.2 test which was run on Nov 21.

While cluster-density-v2 had not been run since then we have been doing weekly runs of the 25 node node-density-heavy test. The last successful baseline of this particular test was Jan 8 - subsequent runs showed stable memory on Jan 15 and an increase on Jan 22 but this was fixed by mmahmoud@redhat.com the following day - as such we had no indicator at this smaller scale of any memory increase of this severity.

Attempted Solutions Thus Far

Add flag to PacketDrop to account for RHEL9.3 behavior	netobserv-ebpf-agent/pull/258	Saw 109.57% increase over 1.4.2
Removed some Loki labels	network-observability-operator/pull/552	Saw 125.28% increase over 1.4.2 but also 61.70% more flows were processed so the number is less severe than it looks
Reduce maxGlobalStreamsPerTenant from 200000 to 150000	N/A	Saw 88.98% increase over 1.4.2
Reduce overall number of Loki streams	network-observability-operator/pull/554	Saw 102.56% increase over 1.4.2 but also 71.37% more flows were processed so the number is less severe than it looks
Rerun of above test at request of mmahmoud@redhat.com	network-observability-operator/pull/554	Saw 99.06% increase over 1.4.2 but also 69.55% more flows were processed so the number is less severe than it looks
PR image + increased eBPF pod memory limit from 800Mi default to 2000Mi	network-observability-operator/pull/554	Saw 107.47% increase over 1.4.2 but also 101.23% more flows were processed
Run with default settings using bundle 104	N/A	Saw 107.35% increase over 1.4.2, but -1.50% less flows
Reran 1.4.2	N/A	Saw 0.79% increase over original baseline with 1.64% more flows, essentially the same (did have issues with FLP auth with Loki so no flows were written)
Run with default settings using bundle 107	N/A	Saw 130.67% increase over 1.4.2, but 0.17% more flows
Bundle 107 rerun with pprof	N/A	Saw 103.52% increase over 1.4.2, but -0.04% less flows
Lower default kafkaBatchSize, eBPF memory limit of 1600Mi	network-observability-operator/pull/566	Saw -24.34% decrease in eBPF memory usage but 133.54% increase in Loki memory usage
Lower default kafkaBatchSize, eBPF memory limit of 800Mi	network-observability-operator/pull/566	Saw -25.61% decrease in eBPF memory usage but 155.27% increase in Loki memory usage

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2024-02-08-17-03-56-456.png
287 kB
2024/02/08 10:03 PM

is related to

NETOBSERV-590 Improve eBPF agent performance 1.5

Closed

NETOBSERV-1091 netobserv-ebpf-agent performance degradation between 1.3 and 1.2

Closed

NETOBSERV-1107 Improve ebpf agent memory usage

Closed

relates to

NETOBSERV-1468 Add monthly NetObserv Perf runs for larger-scale scenarios

Closed

split from

NETOBSERV-1330 Run performance tests for 1.5 release

Closed

links to

NetObserv Perf/Scale Runs

netobserv/network-observability-operator#554: NETOBSERV-1458: reduce loki streams, index FlowLayer

netobserv/network-observability-operator#566: NETOBSERV-1470: reduce memory usage in agent due to kafka batches [1.5 backport]

netobserv/network-observability-operator#567: NETOBSERV-1470: Reduce memory usage in agent due to kafka batches

openshift/openshift-docs#70738: Network Observability 1.5 release notes

(5 links to)

Assignee:: Unassigned

Reporter:: Nathan Weinberg

Need Info From:: None

Contributors:: None

Architect:: None

QA Contact:: Nathan Weinberg

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/02/01 12:31 AM

Updated:: 2025/07/29 5:32 PM

Resolved:: 2024/04/09 7:07 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates