Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: netobserv-ocp4.13, netobserv-1.2
Affects Version/s: netobserv-1.2
Component/s: FLP
Labels:
- perfscale_25_nodes

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important

Target Version:
None
Release Blocker:
None
Sprint:
NetObserv - Sprint 234

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

We have observed during recent 1.2 PerfScale testing repeated behavior where a cluster, once under load, FLP pods will exhibit the following behavior:

Large groups of pods will be deleted (not restarted) and immediately recreated - these do not show as restarts as the pods are being replaced not restarted - video for reference flp.webm
We have seen two different outcomes this has had on flows - in some cases flows continue to be processed (likely due to the presence of Kafka in these tests) however we have also seen scenarios wherein nodes that are hosting both LokiStack and FLP resources go into NotReady state and flows are dropped

We initially thought this was due to cert reloads, but this does not seem to be the case - this behavior does not occur over time when a cluster is not under load - we did a dedicated test with a similar environment where no traffic was generated and the FLP pod behavior was not observed, pods remained stable. The working theory right now is that the issue is related to cluster resources/allocation.

Since the issue has been recreated multiple times I'm opening this bug to serve as a tracker as we collect more data and try to identify the root cause for this behavior.

Previous discussions relating to this bug:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

flp.webm
2023/03/28 2:49 PM
3.67 MB
Nathan Weinberg

relates to

NETOBSERV-684 Watch TLS certs & reload

Closed

split from

NETOBSERV-902 QE: Run performance tests for 1.2 release

Closed

links to

netobserv/network-observability-operator#312: NETOBSERV-963 revert most of cert watching

mentioned on

Merge request - Updated 4 upstream sources

Merge request - Updated US source to: bce2e23 NETOBSERV-963, revert most of cert watching (#315)

(1 mentioned on)

Assignee:: Joel Takvorian

Reporter:: Nathan Weinberg

Need Info From:: None

Contributors:: None

Architect:: None

QA Contact:: Nathan Weinberg

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/03/28 2:49 PM

Updated:: 2025/07/29 5:35 PM

Resolved:: 2023/04/03 2:16 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates