Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Kafka
Labels:
- perfscale_120_nodes

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important

Target Version:
None
Release Blocker:
None
Sprint:
NetObserv - Sprint 229

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Ran Test Bed 3 with node-density-heavy using the the following modifications to the Loki limits

  limits:
    global:
      ingestion:
        ingestionBurstSize: 100
        ingestionRate: 500
        maxGlobalStreamsPerTenant: 50000

Flows did process initially, but after about 25 minutes the NO Controller and Kafka Zookeeper pods began failing

NO Controller had the following event occur multiple times - my suspicion is this is due to exceeding Pod memory limits as rhn-support-memodi previously observed in a different cluster

Kafka Zookeeper pods began failing shortly after NO Controller, but I am not sure as to why - I did see the following error when inspecting the Zookeeper pods:

  Warning  Unhealthy               58m                 kubelet                  Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of e12b296239a1f3cf92903744ef2ea01b0a94e2b9237def6faccb52fe856ca0e7 is running failed: container process not found

As you can see in the above chart, I canceled the workload and eventually all Zookeeper pods recovered and flows began processing again - however NO Controller remains unstable.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

flowcollector.yaml
6 kB
2022/11/30 7:43 PM
inspect.local.6443639870937166607.tar.xz
316 kB
2022/11/30 7:43 PM
inspect.local.8236614351318825367.tar.xz
1.31 MB
2022/11/30 7:43 PM
image-2022-11-30-14-52-33-949.png
26 kB
2022/11/30 7:52 PM
image-2022-11-30-14-52-45-414.png
22 kB
2022/11/30 7:52 PM

is caused by

NETOBSERV-525 Gather performance data with Kafka

Closed

is related to

NETOBSERV-547 (documentation effort) Tweak loki config

Closed

NETOBSERV-493 Document recommendations

Closed

links to

netobserv/network-observability-operator#229: NETOBSERV-730 Change operator limits

netobserv/network-observability-operator#231: [4.12 backport] NETOBSERV-730 Change operator limits (#229)

openshift/openshift-docs#53263: OSDOCS-3917:Installing and Configuring the Network Observability operator

mentioned on

Merge request - Updated US source to: 8d636cb NETOBSERV-730 Change operator limits (#229) (#231)

(1 links to, 1 mentioned on)

Assignee:: Joel Takvorian

Reporter:: Nathan Weinberg

Need Info From:: None

Contributors:: None

Architect:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/11/30 7:51 PM

Updated:: 2025/07/29 5:37 PM

Resolved:: 2022/12/15 4:05 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates