-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
Logging 5.8.2
-
False
-
None
-
False
-
NEW
-
NEW
-
Bug Fix
-
-
-
Important
Description of problem:
Specifically after the update to OpenShift Container Platform 4 - Cluster Logging 5.8, it was found that collector pods based on Fluentd are taking a long time to finish start-up and also expose the metrics endpoint.
On empty, non busy Cluster this can already take up to 2 minutes while on busy Clusters it was observed to take +10 minutes, triggering the alert CollectorNodeDown rather unexpected.
Right now, it's not clear why start-up is taking that much time but it definitely should be reduced as otherwise customers are facing Critical alerts within their OpenShift Container Platform 4 - Cluster just because a new OpenShift Container Platform 4 - Node is being added (for example) and the collector endpoint is not yet exposing the metrics endpoint, which in turn triggers CollectorNodeDown.
collector with LOG_LEVEL set to trace shows the below activity.
$ oc logs collector-fqmdp -f POD_IPS: 10.135.0.28, PROM_BIND_IP: 0.0.0.0 Setting each total_size_limit for 3 buffers to 5338456678 bytes Setting queued_chunks_limit_size for each buffer to 636 Setting chunk_limit_size for each buffer to 8388608 /var/lib/fluentd/pos/journal_pos.json exists, checking if yajl parser able to parse this json file without any error. ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux] RUBY_GC_HEAP_OLDOBJECT_LIMIT_FACTOR=0.900000 (default value: 2.000000) checking if /var/lib/fluentd/pos/journal_pos.json a valid json by calling yajl parser 2024-01-26 09:04:10 +0000 [info]: fluent/log.rb:362:info: init standalone logger path=nil rotate_age=nil rotate_size=nil 2024-01-26 09:04:10 +0000 [warn]: fluent/log.rb:383:warn: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp 2024-01-26 09:04:10 +0000 [warn]: fluent/log.rb:383:warn: '@' is the system reserved prefix. It works in the nested configuration for now but it will be rejected: @timestamp 2024-01-26 09:04:10 +0000 [info]: fluent/log.rb:362:info: parsing config file is succeeded path="/etc/fluent/fluent.conf" 2024-01-26 09:04:22 +0000 [info]: fluent/log.rb:362:info: starting fluentd-1.16.2 without supervision pid=1 ruby="3.1.2" 2024-01-26 09:04:22 +0000 [info]: fluent/log.rb:362:info: adding filter in @CONCAT pattern="kubernetes.**" type="concat" 2024-01-26 09:04:33 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'concat' 2024-01-26 09:04:34 +0000 [trace]: fluent/log.rb:319:trace: registered metrics plugin 'local' 2024-01-26 09:04:34 +0000 [info]: fluent/log.rb:362:info: adding match in @CONCAT pattern="kubernetes.**" type="relabel" 2024-01-26 09:04:35 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'relabel' 2024-01-26 09:04:35 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="journal" type="grep" 2024-01-26 09:04:36 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'grep' 2024-01-26 09:04:37 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="ovn-audit.log**" type="record_modifier" 2024-01-26 09:04:38 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'record_modifier' 2024-01-26 09:04:38 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="k8s-audit.log openshift-audit.log" type="record_modifier" 2024-01-26 09:04:38 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="journal" type="rewrite_tag_filter" 2024-01-26 09:04:39 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'rewrite_tag_filter' 2024-01-26 09:04:39 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eee38aa20 @keys="CONTAINER_NAME">, /^k8s_kibana\./, "", "kubernetes.journal.container.kibana", nil] 2024-01-26 09:04:39 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb13e68 @keys="CONTAINER_NAME">, /^k8s_[^_]+_logging-eventrouter-[^_]+_/, "", "kubernetes.journal.container._default_.kubernetes-event", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb17400 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_default_/, "", "kubernetes.journal.container._default_", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb104e8 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_kube-(.+)_/, "", "kubernetes.journal.container._kube-$1_", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10768 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_openshift-(.+)_/, "", "kubernetes.journal.container._openshift-$1_", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10740 @keys="CONTAINER_NAME">, /^k8s_[^_]+_[^_]+_openshift_/, "", "kubernetes.journal.container._openshift_", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb10538 @keys="CONTAINER_NAME">, /^k8s_.*fluentd/, "", "kubernetes.journal.container.fluentd", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: CONTAINER_NAME [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb6ee30 @keys="CONTAINER_NAME">, /^k8s_/, "", "kubernetes.journal.container", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding rewrite_tag_filter rule: _TRANSPORT [#<Fluent::PluginHelper::RecordAccessor::Accessor:0x00007f8eeeb13288 @keys="_TRANSPORT">, /.+/, "", "journal.system", nil] 2024-01-26 09:04:40 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="kubernetes.**" type="kubernetes_metadata" 2024-01-26 09:04:42 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'kubernetes_metadata' 2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found directory with secrets: /var/run/secrets/kubernetes.io/serviceaccount 2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found CA certificate: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt 2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Found pod token: /var/run/secrets/kubernetes.io/serviceaccount/token 2024-01-26 09:05:41 +0000 [debug]: [kubernetes-metadata] Creating K8S client 2024-01-26 09:05:42 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="kubernetes.var.log.pods.**_eventrouter-**" type="parse_json_field" 2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'parse_json_field' 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="k8s-audit.log**" type="record_modifier" 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="openshift-audit.log**" type="record_modifier" 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="**" type="viaq_data_model" 2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'viaq_data_model' 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: Configured 6 handlers for viaq_data_model 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding filter in @INGRESS pattern="**" type="elasticsearch_genid_ext" 2024-01-26 09:06:03 +0000 [trace]: fluent/log.rb:319:trace: registered filter plugin 'elasticsearch_genid_ext' 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="kubernetes.var.log.pods.openshift_** kubernetes.var.log.pods.openshift-*_** kubernetes.var.log.pods.default_** kubernetes.var.log.pods.kube-*_** journal.** system.var.log**" type="relabel" 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="kubernetes.**" type="relabel" 2024-01-26 09:06:03 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="linux-audit.log** k8s-audit.log** openshift-audit.log** ovn-audit.log**" type="null" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'null' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered buffer plugin 'memory' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @INGRESS pattern="**" type="stdout" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'stdout' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'stdout' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'json' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @_APPLICATION pattern="**" type="record_modifier" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @_APPLICATION pattern="**" type="relabel" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @_INFRASTRUCTURE pattern="**" type="record_modifier" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @_INFRASTRUCTURE pattern="**" type="relabel" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT_PIPELINE_0_ pattern="**" type="relabel" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="viaq_data_model" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: Configured 2 handlers for viaq_data_model 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="viaq_data_model" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: Configured 3 handlers for viaq_data_model 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding filter in @DEFAULT pattern="**" type="record_modifier" 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT pattern="retry_default" type="elasticsearch" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered output plugin 'elasticsearch' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered buffer plugin 'file' 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'host elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host: elasticsearch' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'index_name fluentd' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'index_name: fluentd' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'template_name ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'template_name: ' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_prefix logstash' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_prefix: logstash' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' has timestamp placeholders, but chunk key 'time' is not configured 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'deflector_alias ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'deflector_alias: ' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'application_name default' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'application_name: default' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'ilm_policy_id logstash-policy' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'ilm_policy_id: logstash-policy' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [retry_default] Need substitution: false 2024-01-26 09:06:04 +0000 [debug]: [retry_default] 'host_placeholder elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host_placeholder: elasticsearch' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding match in @DEFAULT pattern="**" type="elasticsearch" 2024-01-26 09:06:04 +0000 [debug]: [default] 'host elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host: elasticsearch' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'index_name fluentd' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'index_name: fluentd' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'template_name ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'template_name: ' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_prefix logstash' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_prefix: logstash' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' has timestamp placeholders, but chunk key 'time' is not configured 2024-01-26 09:06:04 +0000 [debug]: [default] 'logstash_dateformat %Y.%m.%d' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'logstash_dateformat: %Y.%m.%d' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'deflector_alias ' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'deflector_alias: ' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'application_name default' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'application_name: default' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] 'ilm_policy_id logstash-policy' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'ilm_policy_id: logstash-policy' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [debug]: [default] Need substitution: false 2024-01-26 09:06:04 +0000 [debug]: [default] 'host_placeholder elasticsearch' is tested built-in placeholder(s) but there is no valid placeholder(s). error: Parameter 'host_placeholder: elasticsearch' doesn't have tag placeholder 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus_monitor" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'monitor_agent' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus_monitor' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="collected_tail_monitor" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'collected_tail_monitor' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="prometheus_output_monitor" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'prometheus_output_monitor' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="systemd" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'systemd' 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: adding source type="tail" 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered parser plugin 'regexp' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered parser plugin 'multiline' 2024-01-26 09:06:04 +0000 [trace]: fluent/log.rb:319:trace: registered input plugin 'tail' 2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: No fluent logger for internal event 2024-01-26 09:06:04 +0000 [info]: fluent/log.rb:362:info: starting fluentd worker pid=1 ppid=0 worker=0 2024-01-26 09:06:04 +0000 [debug]: [default] restoring buffer file: path = /var/lib/fluentd/default/buffer.b60fd59414fb468704da38689820a82dd.log 2024-01-26 09:06:04 +0000 [debug]: [default] buffer started instance=50740 stage_size=1385 queue_size=0 2024-01-26 09:06:04 +0000 [debug]: [default] flush_thread actually running 2024-01-26 09:06:04 +0000 [debug]: [default] enqueue_thread actually running 2024-01-26 09:06:04 +0000 [trace]: [default] enqueueing all chunks in buffer instance=50740 2024-01-26 09:06:04 +0000 [trace]: [default] enqueueing chunk instance=50740 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag="journal.system", variables=nil, seq=0> 2024-01-26 09:06:04 +0000 [debug]: [default] flush_thread actually running 2024-01-26 09:06:04 +0000 [debug]: [retry_default] buffer started instance=50540 stage_size=0 queue_size=0 2024-01-26 09:06:04 +0000 [debug]: [retry_default] flush_thread actually running 2024-01-26 09:06:04 +0000 [debug]: [retry_default] flush_thread actually running 2024-01-26 09:06:04 +0000 [debug]: [retry_default] enqueue_thread actually running 2024-01-26 09:06:04 +0000 [trace]: [retry_default] enqueueing all chunks in buffer instance=50540 2024-01-26 09:06:04 +0000 [debug]: [container-input] Compacted entries: [113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013] 2024-01-26 09:06:04 +0000 [debug]: [container-input] Remove missing entries. existing_targets=[113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013] entries_after_removing=[113256693, 113256700, 230698608, 230698592, 308859, 311379, 31460507, 31460481, 186665527, 39849774, 39849783, 220227405, 220227401, 159387771, 159387760, 144712228, 144712221, 144712223, 157291095, 157291617, 247495896, 247495872, 239079181, 239079226, 161490779, 161492440, 218132301, 218139419, 236981098, 236981908, 86024950, 48558971, 48558972, 142625135, 142668580, 10516499, 10516501, 205528655, 205528643, 167837969, 167837955, 251665500, 247496289, 247496260, 92318080, 92277873, 71313938, 71313936, 106979091, 106979072, 180376975, 180376966, 35657852, 35657837, 5099504, 5709147, 88083260, 88083214, 88095616, 88083220, 312905, 312912, 39847711, 39847707, 90185448, 90185427, 8401141, 8401137, 226496711, 226496712, 188767668, 188767672, 56627374, 56627380, 230697759, 230697754, 33570296, 33570293, 218139422, 218139424, 190887557, 190887555, 67271796, 67271790, 276829459, 276829440, 94374066, 94374062, 56638849, 56638850, 125862027, 44075516, 176184067, 246170919, 260198082, 59447331, 83937097, 39847705, 39847699, 77714700, 77714688, 207697209, 207697195, 207697196, 157291626, 157291628, 157308797, 157311490, 111154213, 111154217, 278970461, 278970470, 86011448, 86011454, 216034907, 216034914, 155341652, 155341662, 60854617, 60854630, 5703197, 5705886, 182523913, 182523924, 199336135, 199336145, 128045419, 128045430, 8421127, 170054485, 170054498, 251677717, 251677705, 29424998, 29425013] 2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: listening prometheus http server on https:://0.0.0.0:24231//metrics for worker0
Specifically fluent/log.rb:319:trace: registered filter plugin 'kubernetes_metadata' takes a long time and it's not clear what is happening during that time.
2024-01-26 09:06:04 +0000 [debug]: fluent/log.rb:341:debug: listening prometheus http server on https:://0.0.0.0:24231//metrics for worker0 shows when the metrics become available. Given this is a complete empty OpenShift Container Platform 4 - Cluster it's definitely unexpected that this takes 2 minutes.
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4 - Cluster Logging 5.8.2
How reproducible:
Always
Steps to Reproduce:
- Install OpenShift Container Platform 4
- Deploy OpenShift Container Platform 4 - Cluster Logging 5.8.2 with elasticsearch as log store configured and FluentD for collector
- Restart collector and check metrics availability using date; oc exec collector-<id> – curl -vk https://localhost:24231/metrics
Actual results:
$ date; oc exec collector-<id> -- curl -vk https://localhost:24231/metrics
Fri Jan 26 09:51:13 AM CET 2024
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying ::1:24231...
* connect to ::1 port 24231 failed: Connection refused
* Trying 127.0.0.1:24231...
* connect to 127.0.0.1 port 24231 failed: Connection refused
* Failed to connect to localhost port 24231: Connection refused
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0
* Closing connection 0
curl: (7) Failed to connect to localhost port 24231: Connection refused
command terminated with exit code 7
Is being reported for more than 2 minutes on an empty OpenShift Container Platform 4 - Cluster and can take up to 10 minutes on a busy OpenShift Container Platform 4 - Cluster.
Expected results:
$ date; oc exec collector-<id> -- curl -vk https://localhost:24231/metrics
Fri Jan 26 10:06:05 AM CET 2024
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying ::1:24231...
* connect to ::1 port 24231 failed: Connection refused
* Trying 127.0.0.1:24231...
* Connected to localhost (127.0.0.1) port 24231 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
* TLSv1.0 (OUT), TLS header, Certificate Status (22):
} [5 bytes data]
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
} [512 bytes data]
* TLSv1.2 (IN), TLS header, Certificate Status (22):
{ [5 bytes data]
metrics to become available quickly so scraping can happen and otherwise improve CollectorNodeDown so that it does not fire already after 10 minutes as this will trigger too much false/positive.