Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.4.0
Affects Version/s: Logging 5.3.0
Component/s: Log Collection
Labels:
- devel_ack+
- no-rn

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
VERIFIED
Release Note Text:
Before this update, internal changes to the cluster-logging-operator removed resources that allowed the operator's metrics to be scraped. With this update, those resources were added to resolve the issue.
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of Problem: Hello Team, AlertManager starts to throw this alert when cluster-logging was upgraded to 5.3.0-55

Version-Release number of selected component (if applicable):

Server Version: 4.8.17
cluster-logging.5.3.0-55

How Reproducible:

Always

Steps To Reproduce:

- Upgrade to cluster-logging 5.3.0-55
- Alerts will be fired in the alert manager

Additional Information:

Alert Details:

Labels
alertname = TargetDown
job = cluster-logging-operator-metrics
namespace = openshift-logging
prometheus = openshift-monitoring/k8s
service = cluster-logging-operator-metrics
severity = warning
Annotations
description = 100% of the cluster-logging-operator-metrics/cluster-logging-operator-metrics targets in openshift-logging namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
summary = Some targets were not reachable from the monitoring server for an extended period of time.

$ oc get pods

cluster-logging-operator-55c7dc97c9-pjmhp      1/1    Running    0         10h
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
elasticsearch-cdm-xxxx-1-xxxxx-xxxxx  2/2    Running    0         23h
elasticsearch-cdm-xxxx-2-xxxxx-xxxxx  2/2    Running    0         22h
elasticsearch-cdm-xxxx-3-xxxxx-xxxxx  2/2    Running    0         22h
elasticsearch-im-app-27279375-8frhb            0/1    Failed     0         3d
elasticsearch-im-app-27284220-n9jxk            0/1    Succeeded  0         14m
elasticsearch-im-audit-27283830-nxf5n          0/1    Failed     0         6h44m
elasticsearch-im-audit-27284220-lsslm          0/1    Succeeded  0         14m
elasticsearch-im-infra-27283980-c62sh          0/1    Failed     0         4h14m
elasticsearch-im-infra-27284220-xm8b5          0/1    Succeeded  0         14m
kibana-57c7d75755-xxxx                        2/2    Running    0         1d

$ oc get service

cluster-logging-operator-metrics  ClusterIP  172.30.51.195  <none>       8383/TCP,8686/TCP   67d
collector                         ClusterIP  172.30.13.223  <none>       24231/TCP,2112/TCP  1d
elasticsearch                     ClusterIP  172.30.15.24   <none>       9200/TCP            150d
elasticsearch-cluster             ClusterIP  172.30.21.182  <none>       9300/TCP            150d
elasticsearch-metrics             ClusterIP  172.30.70.112  <none>       60001/TCP           150d
kibana                            ClusterIP  172.30.249.34  <none>       443/TCP             150d

Curl output:

sh-4.4$ curl -kvv http://172.30.51.195:8686/metrics
*   Trying 172.30.51.195...
* TCP_NODELAY set
* connect to 172.30.51.195 port 8686 failed: Connection refused
* Failed to connect to 172.30.51.195 port 8686: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 172.30.51.195 port 8686: Connection refused

Let me know in case any more furthur details are required.

clones

LOG-1975 [release-5.3] After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

Closed

duplicates

LOG-2092 After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

Closed

is cloned by

LOG-2092 After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

Closed

links to

[KCS] 100% of cluster-logging-operator-metrics targets unreachable

LOG-1975: return back metrics service for CLO

Assignee:: Vitalii Parfonov

Reporter:: Himank Chaturvedi

QA Contact:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2021/12/21 1:51 AM

Updated:: 2022/04/20 2:14 PM

Resolved:: 2021/12/21 1:55 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates