Loading...

XML

Word

Printable

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: Logging 5.8.z
Affects Version/s: None
Component/s: Log Collection
Labels:
- devel_ack+
- fluentd

Blocked:
False
Blocked Reason:
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Release Note Type:
Bug Fix
Intelligence Requested:
Market:
RH Private Keywords:

Severity:
Moderate
Customer Impact:

Customer Escalated

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Prometheus could not scrape collector (Fluentd) metric in dualstack cluster.

Alert "collectornodedown" continuously firing for all collectors when using fluentd collector in dualstack cluster.

Checking netstat for IPv6 port listening on port 24231:

$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -6 -lt | grep 24231;'
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
collector-6pd7j:LISTEN 0      4096            [::]:24231         [::]:*
collector-9wv7l:LISTEN 0      4096            [::]:24231         [::]:*
collector-czc5r:LISTEN 0      4096            [::]:24231         [::]:*
collector-dp2pk:LISTEN 0      4096            [::]:24231         [::]:*
collector-g59z6:LISTEN 0      4096            [::]:24231         [::]:*
collector-kn6hf:LISTEN 0      4096            [::]:24231         [::]:*
collector-lqjf5:LISTEN 0      4096            [::]:24231         [::]:*
collector-t6mqs:LISTEN 0      4096            [::]:24231         [::]:*
collector-vm2hn:LISTEN 0      4096            [::]:24231         [::]:*
collector-wtk6w:LISTEN 0      4096            [::]:24231         [::]:*

Checking netstat for IPv4 port listening on port 24231:

$ oc -n openshift-logging get pods -l component=collector -o=custom-columns=:metadata.name --no-headers | xargs -r -n1 -I {} oc -n openshift-logging exec {} -c collector -- bash -c 'echo -n {}:; ss -4 -lt | grep 24231;'
xargs: warning: options --max-args and --replace/-I/-i are mutually exclusive, ignoring previous --max-args value
collector-6pd7j:command terminated with exit code 1
collector-9wv7l:command terminated with exit code 1
collector-czc5r:command terminated with exit code 1
collector-dp2pk:command terminated with exit code 1
collector-g59z6:command terminated with exit code 1
collector-kn6hf:command terminated with exit code 1
collector-lqjf5:command terminated with exit code 1
collector-t6mqs:command terminated with exit code 1
collector-vm2hn:command terminated with exit code 1
collector-wtk6w:command terminated with exit code 1

The logging is working fine only alert is firing due to prometheus not able to connect to collector:

$ oc project openshift-monitoring
$ oc rsh prometheus-k8s-0
sh-4.4$ curl -kv https://x.x.x.x:24231/metrics
*   Trying x.x.x.x...
* TCP_NODELAY set
* connect to x.x.x.x port 24231 failed: Connection refused
* Failed to connect to x.x.x.x port 24231: Connection refused
* Closing connection 0
curl: (7) Failed to connect to x.x.x.x port 24231: Connection refused

Below workaround fixed the issue:

Put clusterlogging in Unmanaged.
Take backup of collector-config configmap.
Modify line: bind "# {ENV['PROM_BIND_IP']}
" to bind "0.0.0.0" in collector-config configmap.
Save the configmap and restart the collector pods.
Tested that curl from prometheus to collector pods was successful and the collector node down alert also got clear.

Version-Release number of selected component (if applicable):

Bug is present only for fluentd collector and not with vector.

Logging version 5.8.2 and 5.8.3

How reproducible: 100%

Steps to Reproduce:

Deploy a dual stack cluster.
Install logging operator version 5.8.2.
Setup clusterlogging/instance with fluentd collector.
Check connectivity from prometheus to collector metrics.
After sometime "CollectorNodeDown" alert starts firing.

Actual results:

Prometheus unable curl collector metric and alert "CollectorNodeDown" firing continuously.

Expected results:

Prometheus should be able to curl collector metric and alert "CollectorNodeDown" should not fire.

links to

KCS

Assignee:: Unassigned

Reporter:: Aman Dev Verma

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/02/14 8:33 AM

Updated:: 2024/08/26 2:36 PM

Resolved:: 2024/08/26 2:36 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible: 100%

Steps to Reproduce:

Actual results:

Expected results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates