Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-5023

Fluentd collector pod takes long time to be ready and expose metric endpoint

XMLWordPrintable

    • False
    • None
    • False
    • NEW
    • NEW
    • Bug Fix
    • Important

      Description of problem:

      Specifically after the update to OpenShift Container Platform 4 - Cluster Logging 5.8, it was found that collector pods based on Fluentd are taking a long time to finish start-up and also expose the metrics endpoint.

      On empty, non busy Cluster this can already take up to 2 minutes while on busy Clusters it was observed to take +10 minutes, triggering the alert CollectorNodeDown rather unexpected.

      Right now, it's not clear why start-up is taking that much time but it definitely should be reduced as otherwise customers are facing Critical alerts within their OpenShift Container Platform 4 - Cluster just because a new OpenShift Container Platform 4 - Node is being added (for example) and the collector endpoint is not yet exposing the metrics endpoint, which in turn triggers CollectorNodeDown.

      collector with LOG_LEVEL set to trace shows the below activity.

      $ oc logs collector-fqmdp -f
      POD_IPS: 10.135.0.28, PROM_BIND_IP: 0.0.0.0
      Setting each total_size_limit for 3 buffers to 5338456678 bytes
      Setting queued_chunks_limit_size for each buffer to 636
      Setting chunk_limit_size for each buffer to 8388608
      /var/lib/fluentd/pos/journal_pos.json exists, checking if yajl parser able to parse this json file without any error.
      ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
      RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.900000 (default value: 2.000000)
      checking if /var/lib/fluentd/pos/journal_pos.json a valid json by calling yajl parser
      2024-01-26 09:04:10 +0000 [info]: fluent/log.rb:362:info: init standalone logger path=nil rotate_age=nil rotate_size=nil
      2024-01-26 09:04:10 +0000 [warn]: fluent/log.rb:383:warn: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
      2024-01-26 09:04:10 +0000 [warn]: fluent/log.rb:383:warn: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp
      2024-01-26 09:04:10 +0000 [info]: fluent/log.rb:362:info: parsing config file is succeeded path="/etc/fluent/fluent.conf"
      2024-01-26 09:04:22 +0000 [info]: fluent/log.rb:362:info: starting fluentd-1.16.2 without supervision pid=1 ruby="3.1.2"
      2024-01-26 09:04:22 +0000 [info]: fluent/log.rb:362:info: adding filter in @CONCAT pattern="kubernetes.**" type="concat"
      2024-01-26 09:04:33 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'concat'
      2024-01-26 09:04:34 +0000 [trace]: fluent/log.rb:319:trace: registered metrics plugin 'local'
      2024-01-26 09:04:34 +0000 [info]: fluent/log.rb:362:info: adding match in @CONCAT pattern="kubernetes.**" type="relabel"
      2024-01-26 09:04:35 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'relabel'
      2024-01-26 09:04:35 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="journal" type="grep"
      2024-01-26 09:04:36 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'grep'
      2024-01-26 09:04:37 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="ovn-audit.log**" type="record_modifier"
      2024-01-26 09:04:38 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'record_modifier'
      2024-01-26 09:04:38 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="k8s-audit.log openshift-audit.log" type="record_modifier"
      2024-01-26 09:04:38 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="journal" type="rewrite_tag_filter"
      2024-01-26 09:04:39 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'rewrite_tag_filter'
      2024-01-26 09:04:39 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eee38aa20 @keys="CONTAINER_NAME">, /^k8s_kibana\./, "", "kubernetes.journal.container.kibana", nil]
      2024-01-26 09:04:39 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb13e68 @keys="CONTAINER_NAME">, /^k8s_[^_]+_logging-eventrouter-[^_]+_/, "", "kubernetes.journal.container._default_.kubernetes-event", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb17400 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_default_/, "", "kubernetes.journal.container._default_", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb104e8 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_kube-(.+)_/, "", "kubernetes.journal.container._kube-$1_", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10768 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_openshift-(.+)_/, "", "kubernetes.journal.container._openshift-$1_", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10740 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_openshift_/, "", "kubernetes.journal.container._openshift_", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10538 @keys="CONTAINER_NAME">, /^k8s_.*fluentd/, "", "kubernetes.journal.container.fluentd", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb6ee30 @keys="CONTAINER_NAME">, /^k8s_/, "", "kubernetes.journal.container", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: _TRANSPORT [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb13288 @keys="_TRANSPORT">, /.+/, "", "journal.system", nil]
      2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="kubernetes.**" type="kubernetes_metadata"
      2024-01-26 09:04:42 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'kubernetes_metadata'
      2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found directory with secrets: /var/run/secrets/kubernetes.io/serviceaccount
      2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found CA certificate: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found pod token: /var/run/secrets/kubernetes.io/serviceaccount/token
      2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Creating K8S client
      2024-01-26 09:05:42 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="kubernetes.var.log.pods.**_eventrouter-**" type="parse_json_field"
      2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'parse_json_field'
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="k8s-audit.log**" type="record_modifier"
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="openshift-audit.log**" type="record_modifier"
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="**" type="viaq_data_model"
      2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'viaq_data_model'
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: Configured 6 handlers for viaq_data_model
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="**" type="elasticsearch_genid_ext"
      2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'elasticsearch_genid_ext'
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="kubernetes.var.log.pods.openshift_** kubernetes.var.log.pods.openshift-*_** kubernetes.var.log.pods.default_** kubernetes.var.log.pods.kube-*_** journal.** system.var.log**" type="relabel"
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="kubernetes.**" type="relabel"
      2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="linux-audit.log** k8s-audit.log** openshift-audit.log** ovn-audit.log**" type="null"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'null'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered buffer plugin 'memory'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="**" type="stdout"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'stdout'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'stdout'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'json'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @_APPLICATION pattern="**" type="record_modifier"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @_APPLICATION pattern="**" type="relabel"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @_INFRASTRUCTURE pattern="**" type="record_modifier"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @_INFRASTRUCTURE pattern="**" type="relabel"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT_PIPELINE_0_ pattern="**" type="relabel"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="viaq_data_model"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: Configured 2 handlers for viaq_data_model
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="viaq_data_model"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: Configured 3 handlers for viaq_data_model
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier"
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT pattern="retry_default" type="elasticsearch"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'elasticsearch'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered buffer plugin 'file'
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'host elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host: elasticsearch' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'index_name fluentd' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'index_name: fluentd' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'template_name ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'template_name: ' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_prefix logstash' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_prefix: logstash' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' has timestamp placeholders, but chunk key 'time' is not configured
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'deflector_alias ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'deflector_alias: ' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'application_name default' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'application_name: default' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'ilm_policy_id logstash-policy' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'ilm_policy_id: logstash-policy' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] Need substitution: false
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'host_placeholder elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host_placeholder: elasticsearch' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT pattern="**" type="elasticsearch"
      2024-01-26 09:06:04 +0000 [debug]: [default] 'host elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host: elasticsearch' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'index_name fluentd' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'index_name: fluentd' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'template_name ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'template_name: ' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_prefix logstash' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_prefix: logstash' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' has timestamp placeholders, but chunk key 'time' is not configured
      2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'deflector_alias ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'deflector_alias: ' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'application_name default' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'application_name: default' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] 'ilm_policy_id logstash-policy' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'ilm_policy_id: logstash-policy' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [debug]: [default] Need substitution: false
      2024-01-26 09:06:04 +0000 [debug]: [default] 'host_placeholder elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host_placeholder: elasticsearch' doesn't have tag placeholder
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus_monitor"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'monitor_agent'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus_monitor'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="collected_tail_monitor"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'collected_tail_monitor'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus_output_monitor"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus_output_monitor'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="systemd"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'systemd'
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="tail"
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered parser plugin 'regexp'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered parser plugin 'multiline'
      2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'tail'
      2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: No fluent logger for internal event
      2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: starting fluentd worker pid=1 ppid=0 worker=0
      2024-01-26 09:06:04 +0000 [debug]: [default] restoring buffer file: path = /var/lib/fluentd/default/buffer.b60fd59414fb468704da38689820a82dd.log
      2024-01-26 09:06:04 +0000 [debug]: [default] buffer started instance=50740 stage_size=1385 queue_size=0
      2024-01-26 09:06:04 +0000 [debug]: [default] flush_thread actually running
      2024-01-26 09:06:04 +0000 [debug]: [default] enqueue_thread actually running
      2024-01-26 09:06:04 +0000 [trace]: [default] enqueueing all chunks in buffer instance=50740
      2024-01-26 09:06:04 +0000 [trace]: [default] enqueueing chunk instance=50740 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="journal.system", variables=nil, seq=0>
      2024-01-26 09:06:04 +0000 [debug]: [default] flush_thread actually running
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] buffer started instance=50540 stage_size=0 queue_size=0
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] flush_thread actually running
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] flush_thread actually running
      2024-01-26 09:06:04 +0000 [debug]: [retry_default] enqueue_thread actually running
      2024-01-26 09:06:04 +0000 [trace]: [retry_default] enqueueing all chunks in buffer instance=50540
      2024-01-26 09:06:04 +0000 [debug]: [container-input] Compacted entries: [113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013]
      2024-01-26 09:06:04 +0000 [debug]: [container-input] Remove missing entries. existing_targets=[113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013] entries_after_removing=[113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013]
      2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: listening prometheus http server on https:://0.0.0.0:24231//metrics for worker0
      

      Specifically fluent/log.rb:319:trace: registered filter plugin 'kubernetes_metadata' takes a long time and it's not clear what is happening during that time.

      2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: listening prometheus http server on https:://0.0.0.0:24231//metrics for worker0 shows when the metrics become available. Given this is a complete empty OpenShift Container Platform 4 - Cluster it's definitely unexpected that this takes 2 minutes.

      Version-Release number of selected component (if applicable):

      OpenShift Container Platform 4 - Cluster Logging 5.8.2

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install OpenShift Container Platform 4
      2. Deploy OpenShift Container Platform 4 - Cluster Logging 5.8.2 with elasticsearch as log store configured and FluentD for collector
      3. Restart collector and check metrics availability using date; oc exec collector-<id> – curl -vk https://localhost:24231/metrics

      Actual results:

      $ date; oc exec collector-<id> -- curl -vk https://localhost:24231/metrics
      Fri Jan 26 09:51:13 AM CET 2024
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
        0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:24231...
      * connect to ::1 port 24231 failed: Connection refused
      *   Trying 127.0.0.1:24231...
      * connect to 127.0.0.1 port 24231 failed: Connection refused
      * Failed to connect to localhost port 24231: Connection refused
        0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
      * Closing connection 0
      curl: (7) Failed to connect to localhost port 24231: Connection refused
      command terminated with exit code 7
      

      Is being reported for more than 2 minutes on an empty OpenShift Container Platform 4 - Cluster and can take up to 10 minutes on a busy OpenShift Container Platform 4 - Cluster.

      Expected results:

      $ date; oc exec collector-<id> -- curl -vk https://localhost:24231/metrics
      Fri Jan 26 10:06:05 AM CET 2024
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
        0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0*   Trying ::1:24231...
      * connect to ::1 port 24231 failed: Connection refused
      *   Trying 127.0.0.1:24231...
      * Connected to localhost (127.0.0.1) port 24231 (#0)
      * ALPN, offering h2
      * ALPN, offering http/1.1
      *  CAfile: /etc/pki/tls/certs/ca-bundle.crt
      * TLSv1.0 (OUT), TLS header, Certificate Status (22):
      } [5 bytes data]
      * TLSv1.3 (OUT), TLS handshake, Client hello (1):
      } [512 bytes data]
      * TLSv1.2 (IN), TLS header, Certificate Status (22):
      { [5 bytes data]
      

      metrics to become available quickly so scraping can happen and otherwise improve CollectorNodeDown so that it does not fire already after 10 minutes as this will trigger too much false/positive.

      Additional info:

        1. fluentd.log
          85 kB
        2. screenshot-1.png
          screenshot-1.png
          19 kB

            Unassigned Unassigned
            rhn-support-sreber Simon Reber
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: