Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: Logging 5.0.5
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:
Undefined
Market:

Sprint:
Logging (Core) - Sprint 209

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

We got one cluster with fluentd buffer files filling up the node disk.

At the beginning the new pods cannot be scheduled to the node due to the disk usage is over 85% and the scheduler will not allow the pod to be created. The fluentd pod cannot run there as well.

sh-4.4# df -h | grep nvme
/dev/nvme0n1p4 350G 298G 52G 86% /host
/dev/nvme0n1p3 364M 190M 151M 56% /host/boot
sh-4.4# du -sh *
3.0G default
277G retry_default

And then we tried to clean some of the buffer files manually to make some free disk space.

sh-4.4# cd /host/sysroot/ostree/deploy/rhcos/var/lib/fluentd
sh-4.4# du -sh *
8.5G default
84G retry_default
sh-4.4# df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p4 350G 111G 239G 32% /host/sysroot

After that, the fluentd can run well, but there are full of errors in the pod log, it cannot sync the log to elasticsearch and also cannot clean up the old buffers.

2021-07-21 05:43:12 +0000 [warn]: suppressed same stacktrace
2021-07-21 05:44:35 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=73 next_retry_seconds=2021-07-21 05:45:41 +0000 chunk="5c66aed6802f04e974fe229e5376a31c" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
 2021-07-21 05:44:35 +0000 [warn]: suppressed same stacktrace
2021-07-21 05:44:35 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=74 next_retry_seconds=2021-07-21 05:45:37 +0000 chunk="5c66ae46a917f94bf59e60b371bc0246" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
 2021-07-21 05:44:35 +0000 [warn]: suppressed same stacktrace
2021-07-21 05:45:55 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=75 next_retry_seconds=2021-07-21 05:46:51 +0000 chunk="5c66ae46a917f94bf59e60b371bc0246" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."
 2021-07-21 05:45:55 +0000 [warn]: suppressed same stacktrace
2021-07-21 05:45:56 +0000 [warn]: [retry_default] failed to flush the buffer. retry_time=76 next_retry_seconds=2021-07-21 05:46:56 +0000 chunk="5c66aed6802f04e974fe229e5376a31c" error_class=Fluent::Plugin::ElasticsearchOutput::RetryStreamEmitFailure error="buffer is full."

Version-Release number of selected component (if applicable):
openshift v4.7.19
cluster-logging 5.0.5-11

How reproducible:
not sure

Steps to Reproduce:
1. as description
2.
3.

Actual results:
The buffer file cannot be synced or cleaned up.

Expected results:
The fluentd service should be able to handle the buffers if they are go up higher than capacity.

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

fluentd-ds.yaml
2021/07/23 9:04 AM
18 kB
Bo Meng
fluentd-configmap.txt
2021/07/23 9:04 AM
20 kB
Bo Meng
fluentd_buffer.txt
2021/07/23 9:12 AM
638 kB
Bo Meng

Assignee:: Jeffrey Cantrill

Reporter:: Bo Meng

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/07/21 7:05 AM

Updated:: 2022/03/14 7:30 PM

Resolved:: 2021/10/18 8:55 PM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates