Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: Logging TP 5.1
Affects Version/s: None
Component/s: Log Collection
Labels:

Blocked:
False
Ready:
False
Epic Link:
LOG-998
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:
Undefined
Market:

Sprint:
Logging (Core) - Sprint 198

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Running ROSA Cloudwatch Logging PerfScale tests per https://docs.google.com/document/d/10vv_SVC7fUammkvdn6-05bGrzdAtXxiJ_Z6QMzfUL4A/edit#

I am seeing fluentd OOM looping about every 15 minutes when running:

220 log-generator pods (on a single node) ; single container per pod

250 messages per minute per pod

512 byte message size

Snip from the fluentd pod describe

```

State: Running
Started: Tue, 02 Feb 2021 16:34:42 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Tue, 02 Feb 2021 16:18:32 +0000
Finished: Tue, 02 Feb 2021 16:34:38 +0000
Ready: True
Restart Count: 2
Limits:
memory: 736Mi
Requests:
cpu: 100m
memory: 736Mi

```

To replicate the issue (assumes a cluster with cluster logging addon installed):

Create log-generator namespace

Pick a worker node that has 220 pod space available (ie less than 30 pods are currently running on the host)

apply:

```

apiVersion: batch/v1
kind: Job
metadata:
name: log-generator
namespace: log-generator
spec:
parallelism: 110
completions: 110
template:
metadata:
labels:
name: log-generator
spec:
nodeSelector:
kubernetes.io/hostname: ip-10-0-224-163
containers:

image: quay.io/dry923/log_generator
name: log-generator
command: ["/usr/bin/python3", "/log_generator.py"]
args: ["--size", "512", "--duration", "60", "--messages-per-minute", "250"]
imagePullPolicy: Never
restartPolicy: Never

```

sleep 30 seconds (the openshift QPS will cause image back offs if you try and deploy all 220 at once; you will still see some but they recover quickly)

apply

```

apiVersion: batch/v1
kind: Job
metadata:
name: log-generator2
namespace: log-generator
spec:
parallelism: 110
completions: 110
template:
metadata:
labels:
name: log-generator2
spec:
nodeSelector:
kubernetes.io/hostname: ip-10-0-224-163
containers:

image: quay.io/dry923/log_generator
name: log-generator
command: ["/usr/bin/python3", "/log_generator.py"]
args: ["--size", "512", "--duration", "60", "--messages-per-minute", "250"]
imagePullPolicy: Never
restartPolicy: Never

```

CPU usage for this pod is ~80% and the memory climbs from ~350Mb to its cap over 15-20 minutes causing the pod to OOM

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2021-02-02-12-22-09-102.png
45 kB
2021/02/02 12:22 PM
image-2021-02-02-12-25-21-021.png
33 kB
2021/02/02 12:25 PM
image-2021-02-02-12-26-09-073.png
46 kB
2021/02/02 12:26 PM
noname
3 kB
2021/04/28 5:56 PM

Assignee:: Igor Karpukhin (Inactive)

Reporter:: Russell Zaleski

QA Contact:: Anping Li

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2021/02/02 11:54 AM

Updated:: 2022/03/16 3:19 PM

Resolved:: 2021/03/03 8:18 AM

Estimated:

Remaining:

Logged:

Not Specified

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Time Tracking