[LOG-1975] [release-5.3] After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: Logging 5.3.2
Affects Version/s: Logging 5.3.0
Component/s: Log Collection
Labels:
- devel_ack+

Blocked:
False
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Market:

Sprint:
Logging (Core) - Sprint 210, Logging (Core) - Sprint 211

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of Problem: Hello Team, AlertManager starts to throw this alert when cluster-logging was upgraded to 5.3.0-55

Version-Release number of selected component (if applicable):

Server Version: 4.8.17
cluster-logging.5.3.0-55

How Reproducible:

Always

Steps To Reproduce:

- Upgrade to cluster-logging 5.3.0-55
- Alerts will be fired in the alert manager

Additional Information:

Alert Details:

Labels
alertname = TargetDown
job = cluster-logging-operator-metrics
namespace = openshift-logging
prometheus = openshift-monitoring/k8s
service = cluster-logging-operator-metrics
severity = warning
Annotations
description = 100% of the cluster-logging-operator-metrics/cluster-logging-operator-metrics targets in openshift-logging namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
summary = Some targets were not reachable from the monitoring server for an extended period of time.

$ oc get pods

cluster-logging-operator-55c7dc97c9-pjmhp      1/1    Running    0         10h
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
collector-xxxx                                2/2    Running    0         1d
elasticsearch-cdm-xxxx-1-xxxxx-xxxxx  2/2    Running    0         23h
elasticsearch-cdm-xxxx-2-xxxxx-xxxxx  2/2    Running    0         22h
elasticsearch-cdm-xxxx-3-xxxxx-xxxxx  2/2    Running    0         22h
elasticsearch-im-app-27279375-8frhb            0/1    Failed     0         3d
elasticsearch-im-app-27284220-n9jxk            0/1    Succeeded  0         14m
elasticsearch-im-audit-27283830-nxf5n          0/1    Failed     0         6h44m
elasticsearch-im-audit-27284220-lsslm          0/1    Succeeded  0         14m
elasticsearch-im-infra-27283980-c62sh          0/1    Failed     0         4h14m
elasticsearch-im-infra-27284220-xm8b5          0/1    Succeeded  0         14m
kibana-57c7d75755-xxxx                        2/2    Running    0         1d

$ oc get service

cluster-logging-operator-metrics  ClusterIP  172.30.51.195  <none>       8383/TCP,8686/TCP   67d
collector                         ClusterIP  172.30.13.223  <none>       24231/TCP,2112/TCP  1d
elasticsearch                     ClusterIP  172.30.15.24   <none>       9200/TCP            150d
elasticsearch-cluster             ClusterIP  172.30.21.182  <none>       9300/TCP            150d
elasticsearch-metrics             ClusterIP  172.30.70.112  <none>       60001/TCP           150d
kibana                            ClusterIP  172.30.249.34  <none>       443/TCP             150d

Curl output:

sh-4.4$ curl -kvv http://172.30.51.195:8686/metrics
*   Trying 172.30.51.195...
* TCP_NODELAY set
* connect to 172.30.51.195 port 8686 failed: Connection refused
* Failed to connect to 172.30.51.195 port 8686: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 172.30.51.195 port 8686: Connection refused

Let me know in case any more furthur details are required.

is cloned by

LOG-2090 After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

Closed

links to

[KCS] 100% of cluster-logging-operator-metrics targets unreachable

openshift/cluster-logging-operator#1272: LOG-1975: return back metrics service for CLO

openshift/cluster-logging-operator#1276: [release-5.3] LOG-1975: return back metrics service for CLO

openshift/cluster-logging-operator#1278: [release-5.3] LOG-1975: return back metrics service for CLO

Nayantara Gupta (Inactive) added a comment - 2022/01/18 1:54 AM

anli@redhat.com
The customer configured the logging v5.3.2.20 and still facing the same issue.
Can you share what changes have been made to fix the issue? I can verify if the same is available in 'v5.3.2.20'?

Nayantara Gupta (Inactive) added a comment - 2022/01/18 1:54 AM anli@redhat.com The customer configured the logging v5.3.2.20 and still facing the same issue. Can you share what changes have been made to fix the issue? I can verify if the same is available in 'v5.3.2.20'?

Anping Li added a comment - 2022/01/04 10:35 AM

Fixed on cluster-logging.5.3.2-17.

Anping Li added a comment - 2022/01/04 10:35 AM Fixed on cluster-logging.5.3.2-17.

Anping Li added a comment - 2021/12/28 2:52 PM

vparfono the port 8686 wasn't exposed.

Anping Li added a comment - 2021/12/28 2:52 PM vparfono the port 8686 wasn't exposed.

Vitalii Parfonov added a comment - 2021/12/20 2:48 PM

anli@redhat.com Can you take a look, please

Vitalii Parfonov added a comment - 2021/12/20 2:48 PM anli@redhat.com Can you take a look, please

Oscar Casal Sanchez added a comment - 2021/12/13 3:00 PM

Hello,

Usually the errata should be linked to this to know exactly when it was fixed. Could you link it? In the opposite way, we don't know when it was fixed and in what version

Oscar Casal Sanchez added a comment - 2021/12/13 3:00 PM Hello, Usually the errata should be linked to this to know exactly when it was fixed. Could you link it? In the opposite way, we don't know when it was fixed and in what version

Vitalii Parfonov added a comment - 2021/12/03 7:41 AM

Hello, rhn-support-adsoni PR under final review, so fix will be available soon

Vitalii Parfonov added a comment - 2021/12/03 7:41 AM Hello, rhn-support-adsoni PR under final review, so fix will be available soon

Vitalii Parfonov added a comment - 2021/11/29 10:00 AM

LGTM

Vitalii Parfonov added a comment - 2021/11/29 10:00 AM LGTM

Vitalii Parfonov added a comment - 2021/11/26 7:14 PM

rhn-support-tmicheli Hello, sorry for delay answer. Yes, as workaround you can create service like this:

apiVersion: v1
kind: Service
metadata: 
  labels: 
    name: cluster-logging-operator
  name: cluster-logging-operator-metrics
  namespace: openshift-logging
spec: 
  ports: 
    - name: cr-metrics
      port: 8080
      protocol: TCP
      targetPort: 8080
  selector: 
    name: cluster-logging-operator
  sessionAffinity: None
  type: ClusterIP

It will work on port 8080 for now, and we will continue working on fix for 5.3.z

Vitalii Parfonov added a comment - 2021/11/26 7:14 PM rhn-support-tmicheli Hello, sorry for delay answer. Yes, as workaround you can create service like this: apiVersion: v 1 kind: Service metadata: labels: name: cluster-logging-operator name: cluster-logging-operator-metrics namespace: openshift-logging spec: ports: - name: cr-metrics port: 8080 protocol: TCP targetPort: 8080 selector: name: cluster-logging-operator sessionAffinity: None type: ClusterIP It will work on port 8080 for now, and we will continue working on fix for 5.3.z

Vimal Kumar added a comment - 2021/11/26 10:47 AM

rhn-support-tmicheli we did some investigation and found a metric service is not started in 5.3. We are working on providing a fix. There is no workaround possible.

Vimal Kumar added a comment - 2021/11/26 10:47 AM rhn-support-tmicheli we did some investigation and found a metric service is not started in 5.3. We are working on providing a fix. There is no workaround possible.

Vimal Kumar added a comment - 2021/11/23 3:22 PM - edited

rhn-support-hchaturv what was logging version prior to upgrade?

Can you share the logs, especially fluentd logs ?

Vimal Kumar added a comment - 2021/11/23 3:22 PM - edited rhn-support-hchaturv what was logging version prior to upgrade? Can you share the logs, especially fluentd logs ?

Assignee:: Vitalii Parfonov

Reporter:: Himank Chaturvedi

Votes:: 0 Vote for this issue

Watchers:: 20 Start watching this issue

Created:: 2021/11/18 11:59 AM

Updated:: 2025/04/04 3:10 PM

Resolved:: 2022/01/04 10:35 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Nayantara Gupta (Inactive) added a comment - 2022/01/18 1:54 AM

Expand comment: Nayantara Gupta (Inactive) added a comment - 2022/01/18 1:54 AM

Collapse comment: Anping Li added a comment - 2022/01/04 10:35 AM

Expand comment: Anping Li added a comment - 2022/01/04 10:35 AM

Collapse comment: Anping Li added a comment - 2021/12/28 2:52 PM

Expand comment: Anping Li added a comment - 2021/12/28 2:52 PM

Collapse comment: Vitalii Parfonov added a comment - 2021/12/20 2:48 PM

Expand comment: Vitalii Parfonov added a comment - 2021/12/20 2:48 PM

Collapse comment: Oscar Casal Sanchez added a comment - 2021/12/13 3:00 PM

Expand comment: Oscar Casal Sanchez added a comment - 2021/12/13 3:00 PM

Collapse comment: Vitalii Parfonov added a comment - 2021/12/03 7:41 AM

Expand comment: Vitalii Parfonov added a comment - 2021/12/03 7:41 AM

Collapse comment: Vitalii Parfonov added a comment - 2021/11/29 10:00 AM

Expand comment: Vitalii Parfonov added a comment - 2021/11/29 10:00 AM

Collapse comment: Vitalii Parfonov added a comment - 2021/11/26 7:14 PM

Expand comment: Vitalii Parfonov added a comment - 2021/11/26 7:14 PM

Collapse comment: Vimal Kumar added a comment - 2021/11/26 10:47 AM

Expand comment: Vimal Kumar added a comment - 2021/11/26 10:47 AM

Collapse comment: Vimal Kumar added a comment - 2021/11/23 3:22 PM, Edited by Vimal Kumar - 2021/11/23 3:25 PM

Expand comment: Vimal Kumar added a comment - 2021/11/23 3:22 PM, Edited by Vimal Kumar - 2021/11/23 3:25 PM

People

Dates