-
Bug
-
Resolution: Done
-
Normal
-
None
-
False
-
False
-
NEW
-
NEW
-
-
-
Logging (Core) - Sprint 205
Description of problem:
When configuring chunkLimitSize and totalLimitSize for tuning the performance of the Fluentd log forwarder, queued_chunks_limit_size is underestimated unintentionally.
Because the tuned totalLimitSize value is ignored and the default totalLimitSize(`15% of Disk` / `num of Outputs`) is picked for calculating queued_chunks_limit_size.
$ oc get ClusterLogging instance -o yaml spec: forwarder: fluentd: buffer: chunkLimitSize: 8m totalLimitSize: 2G $ oc logs fluentd-xxx Setting each total_size_limit for 1 buffers to 4910635622 bytes Setting queued_chunks_limit_size for each buffer to 585 Setting chunk_limit_size for each buffer to 8388608 $ oc get ClusterLogging instance -o yaml spec: forwarder: fluentd: buffer: chunkLimitSize: 1G totalLimitSize: 256G $ oc logs fluentd-yyy Setting each total_size_limit for 1 buffers to 4910635622 bytes Setting queued_chunks_limit_size for each buffer to 4 Setting chunk_limit_size for each buffer to 1073741824
This underestimated queued_chunks_limit_size value will cause performance degradation issues.
Besides, total_size_limit in the fluend pod log doesn't match the configured total_size_limitin the generated flunet.conf.
The following part of fluent.conf is quoted from fluend-yyy.
fluentd-yyy says total_size_limit is 4910635622, but the configured value is 256G.
<label @FLUENTD_INFRA> <match **> # https://docs.fluentd.org/v1.0/articles/in_forward @type forward heartbeat_type none keepalive true <buffer> @type file path '/var/lib/fluentd/fluentd_infra' queued_chunks_limit_size "#{ENV['BUFFER_QUEUE_LIMIT'] || '1024' }" total_limit_size 256G chunk_limit_size 1G
Version-Release number of selected component (if applicable):
The latest code from https://github.com/openshift/cluster-logging-operator master:
$ git log --oneline 2c48ecbc Merge pull request #1021 from vimalk78/log-1355
How reproducible:
Always
Steps to Reproduce:
- Deploy CLO
- Create CLF/instance
- Edit the ClusterLogging CR
$ oc edit ClusterLogging instance spec: forwarder: fluentd: buffer: chunkLimitSize: 1G totalLimitSize: 256G
- Redeploy fluentd pods
- Check queued_chunks_limit_size in the fluentd config map and "oc logs fluentd
Actual results
Please check it in "Description of problem".
Expected results:
There are two options:
- Calculate the max number of queued chunks based on the configured chunk_limit_size and total_limit_size
- Allow users to tune queued_chunks_limit_size as well as chunk_limit_size and total_limit_size
Additional info: