Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-1975

[release-5.3] After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

    • False
    • False
    • NEW
    • NEW
    • Logging (Core) - Sprint 210, Logging (Core) - Sprint 211

       Description of Problem:  Hello Team, AlertManager starts to throw this alert when cluster-logging was upgraded to 5.3.0-55

      Version-Release number of selected component (if applicable):

      Server Version: 4.8.17
      cluster-logging.5.3.0-55

      How Reproducible:

      Always

      Steps To Reproduce:

         - Upgrade to cluster-logging 5.3.0-55
         - Alerts will be fired in the alert manager

      Additional Information:

       Alert Details:

      Labels
      alertname = TargetDown
      job = cluster-logging-operator-metrics
      namespace = openshift-logging
      prometheus = openshift-monitoring/k8s
      service = cluster-logging-operator-metrics
      severity = warning
      Annotations
      description = 100% of the cluster-logging-operator-metrics/cluster-logging-operator-metrics targets in openshift-logging namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
      summary = Some targets were not reachable from the monitoring server for an extended period of time.
      

      $ oc get pods 

      cluster-logging-operator-55c7dc97c9-pjmhp      1/1    Running    0         10h
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      elasticsearch-cdm-xxxx-1-xxxxx-xxxxx  2/2    Running    0         23h
      elasticsearch-cdm-xxxx-2-xxxxx-xxxxx  2/2    Running    0         22h
      elasticsearch-cdm-xxxx-3-xxxxx-xxxxx  2/2    Running    0         22h
      elasticsearch-im-app-27279375-8frhb            0/1    Failed     0         3d
      elasticsearch-im-app-27284220-n9jxk            0/1    Succeeded  0         14m
      elasticsearch-im-audit-27283830-nxf5n          0/1    Failed     0         6h44m
      elasticsearch-im-audit-27284220-lsslm          0/1    Succeeded  0         14m
      elasticsearch-im-infra-27283980-c62sh          0/1    Failed     0         4h14m
      elasticsearch-im-infra-27284220-xm8b5          0/1    Succeeded  0         14m
      kibana-57c7d75755-xxxx                        2/2    Running    0         1d
      
      

      $ oc get service 

      cluster-logging-operator-metrics  ClusterIP  172.30.51.195  <none>       8383/TCP,8686/TCP   67d
      collector                         ClusterIP  172.30.13.223  <none>       24231/TCP,2112/TCP  1d
      elasticsearch                     ClusterIP  172.30.15.24   <none>       9200/TCP            150d
      elasticsearch-cluster             ClusterIP  172.30.21.182  <none>       9300/TCP            150d
      elasticsearch-metrics             ClusterIP  172.30.70.112  <none>       60001/TCP           150d
      kibana                            ClusterIP  172.30.249.34  <none>       443/TCP             150d

      Curl output:

      sh-4.4$ curl -kvv http://172.30.51.195:8686/metrics
      *   Trying 172.30.51.195...
      * TCP_NODELAY set
      * connect to 172.30.51.195 port 8686 failed: Connection refused
      * Failed to connect to 172.30.51.195 port 8686: Connection refused
      * Closing connection 0
      curl: (7) Failed to connect to 172.30.51.195 port 8686: Connection refused

      Let me know in case any more furthur details are required.

       

            [LOG-1975] [release-5.3] After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

            anli@redhat.com
            The customer configured the logging v5.3.2.20 and still facing the same issue.
            Can you share what changes have been made to fix the issue? I can verify if the same is available in 'v5.3.2.20'?

            Nayantara Gupta (Inactive) added a comment - anli@redhat.com The customer configured the logging v5.3.2.20 and still facing the same issue. Can you share what changes have been made to fix the issue? I can verify if the same is available in 'v5.3.2.20'?

            Anping Li added a comment -

            Fixed on cluster-logging.5.3.2-17.

            Anping Li added a comment - Fixed on cluster-logging.5.3.2-17.

            Anping Li added a comment -

            vparfono the port 8686 wasn't exposed.

            Anping Li added a comment - vparfono the port 8686 wasn't exposed.

            anli@redhat.com Can you take a look, please

            Vitalii Parfonov added a comment - anli@redhat.com Can you take a look, please

            Hello,

            Usually the errata should be linked to this to know exactly when it was fixed. Could you link it? In the opposite way, we don't know when it was fixed and in what version

            Oscar Casal Sanchez added a comment - Hello, Usually the errata should be linked to this to know exactly when it was fixed. Could you link it? In the opposite way, we don't know when it was fixed and in what version

            Hello, rhn-support-adsoni PR under final review, so fix will be available soon

            Vitalii Parfonov added a comment - Hello, rhn-support-adsoni PR under final review, so fix will be available soon

            LGTM

            Vitalii Parfonov added a comment - LGTM

            rhn-support-tmicheli Hello, sorry for delay answer. Yes, as workaround you can create service like this:

            apiVersion: v1
            kind: Service
            metadata: 
              labels: 
                name: cluster-logging-operator
              name: cluster-logging-operator-metrics
              namespace: openshift-logging
            spec: 
              ports: 
                - name: cr-metrics
                  port: 8080
                  protocol: TCP
                  targetPort: 8080
              selector: 
                name: cluster-logging-operator
              sessionAffinity: None
              type: ClusterIP 

            It will work on port 8080 for now, and we will continue working on fix for 5.3.z

            Vitalii Parfonov added a comment - rhn-support-tmicheli Hello, sorry for delay answer. Yes, as workaround you can create service like this: apiVersion: v 1 kind: Service metadata: labels: name: cluster-logging-operator name: cluster-logging-operator-metrics namespace: openshift-logging spec: ports: - name: cr-metrics port: 8080 protocol: TCP targetPort: 8080 selector: name: cluster-logging-operator sessionAffinity: None type: ClusterIP It will work on port 8080 for now, and we will continue working on fix for 5.3.z

            rhn-support-tmicheli  we did some investigation and found a metric service is not started in 5.3. We are working on providing a fix. There is no workaround possible.

            Vimal Kumar added a comment - rhn-support-tmicheli   we did some investigation and found a metric service is not started in 5.3. We are working on providing a fix. There is no workaround possible.

            Vimal Kumar added a comment - - edited

            rhn-support-hchaturv  what was logging version prior to upgrade?

            Can you share the logs, especially fluentd logs ?

            Vimal Kumar added a comment - - edited rhn-support-hchaturv   what was logging version prior to upgrade? Can you share the logs, especially fluentd logs ?

              vparfono Vitalii Parfonov
              rhn-support-hchaturv Himank Chaturvedi
              Votes:
              0 Vote for this issue
              Watchers:
              20 Start watching this issue

                Created:
                Updated:
                Resolved: