[LOG-1411] Underestimate queued_chunks_limit_size value with chunkLimitSize and totalLimitSize tuning parameters - Red Hat Issue Tracker

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.2
Affects Version/s: None
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
Before this update, while you were tuning the performance of the Fluentd log forwarder by configuring the `chunkLimitSize` and `totalLimitSize` values, the `Setting queued_chunks_limit_size for each buffer to` message reported values that were too low. The current update fixes this issue so that this message reports the correct values.

Show
Before this update, while you were tuning the performance of the Fluentd log forwarder by configuring the `chunkLimitSize` and `totalLimitSize` values, the `Setting queued_chunks_limit_size for each buffer to` message reported values that were too low. The current update fixes this issue so that this message reports the correct values.
Market:

Sprint:
Logging (Core) - Sprint 205

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When configuring chunkLimitSize and totalLimitSize for tuning the performance of the Fluentd log forwarder, queued_chunks_limit_size is underestimated unintentionally.
Because the tuned totalLimitSize value is ignored and the default totalLimitSize(`15% of Disk` / `num of Outputs`) is picked for calculating queued_chunks_limit_size.

$ oc get ClusterLogging instance -o yaml
spec:
  forwarder:
    fluentd:
      buffer:
        chunkLimitSize: 8m
        totalLimitSize: 2G  
$ oc logs fluentd-xxx
Setting each total_size_limit for 1 buffers to 4910635622 bytes
Setting queued_chunks_limit_size for each buffer to 585
Setting chunk_limit_size for each buffer to 8388608

$ oc get ClusterLogging instance -o yaml
spec:
  forwarder:
    fluentd:
      buffer:
        chunkLimitSize: 1G
        totalLimitSize: 256G
$ oc logs fluentd-yyy
Setting each total_size_limit for 1 buffers to 4910635622 bytes
Setting queued_chunks_limit_size for each buffer to 4
Setting chunk_limit_size for each buffer to 1073741824

This underestimated queued_chunks_limit_size value will cause performance degradation issues.

Besides, total_size_limit in the fluend pod log doesn't match the configured total_size_limitin the generated flunet.conf.

The following part of fluent.conf is quoted from fluend-yyy.
fluentd-yyy says total_size_limit is 4910635622, but the configured value is 256G.

<label @FLUENTD_INFRA>
  <match **>
    # https://docs.fluentd.org/v1.0/articles/in_forward
    @type forward
    heartbeat_type none
    keepalive true

    <buffer>
      @type file
      path '/var/lib/fluentd/fluentd_infra'
      queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '1024' }"
      total_limit_size 256G
      chunk_limit_size 1G

Version-Release number of selected component (if applicable):

The latest code from https://github.com/openshift/cluster-logging-operator master:

  $ git log --oneline
  2c48ecbc Merge pull request #1021 from vimalk78/log-1355

How reproducible:

Always

Steps to Reproduce:

Deploy CLO
Create CLF/instance

Edit the ClusterLogging CR

  $ oc edit ClusterLogging instance
  spec:
    forwarder:
      fluentd:
        buffer:
          chunkLimitSize: 1G
          totalLimitSize: 256G

Redeploy fluentd pods
Check queued_chunks_limit_size in the fluentd config map and "oc logs fluentd

Actual results

Please check it in "Description of problem".

Expected results:

There are two options:

Calculate the max number of queued chunks based on the configured chunk_limit_size and total_limit_size
Allow users to tune queued_chunks_limit_size as well as chunk_limit_size and total_limit_size

Additional info:

links to

openshift/cluster-logging-operator#1120: LOG-1411: Calculate recommended queued_chunks_limit_size with fluentd tuning parameters

Assignee:: Keiichi Kii

Reporter:: Keiichi Kii

QA Contact:: Anping Li

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2021/05/21 7:52 PM

Updated:: 2022/03/16 3:18 PM

Resolved:: 2021/08/05 12:48 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results

Expected results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates