Uploaded image for project: 'OpenShift Logging'
  1. OpenShift Logging
  2. LOG-2092

After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Normal
    • None
    • Logging 5.3.0
    • Log Collection
    • False
    • False
    • NEW
    • VERIFIED

    Description

       Description of Problem:  Hello Team, AlertManager starts to throw this alert when cluster-logging was upgraded to 5.3.0-55

      Version-Release number of selected component (if applicable):

      Server Version: 4.8.17
      cluster-logging.5.3.0-55

      How Reproducible:

      Always

      Steps To Reproduce:

         - Upgrade to cluster-logging 5.3.0-55
         - Alerts will be fired in the alert manager

      Additional Information:

       Alert Details:

      Labels
      alertname = TargetDown
      job = cluster-logging-operator-metrics
      namespace = openshift-logging
      prometheus = openshift-monitoring/k8s
      service = cluster-logging-operator-metrics
      severity = warning
      Annotations
      description = 100% of the cluster-logging-operator-metrics/cluster-logging-operator-metrics targets in openshift-logging namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support.
      summary = Some targets were not reachable from the monitoring server for an extended period of time.
      

      $ oc get pods 

      cluster-logging-operator-55c7dc97c9-pjmhp      1/1    Running    0         10h
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      collector-xxxx                                2/2    Running    0         1d
      elasticsearch-cdm-xxxx-1-xxxxx-xxxxx  2/2    Running    0         23h
      elasticsearch-cdm-xxxx-2-xxxxx-xxxxx  2/2    Running    0         22h
      elasticsearch-cdm-xxxx-3-xxxxx-xxxxx  2/2    Running    0         22h
      elasticsearch-im-app-27279375-8frhb            0/1    Failed     0         3d
      elasticsearch-im-app-27284220-n9jxk            0/1    Succeeded  0         14m
      elasticsearch-im-audit-27283830-nxf5n          0/1    Failed     0         6h44m
      elasticsearch-im-audit-27284220-lsslm          0/1    Succeeded  0         14m
      elasticsearch-im-infra-27283980-c62sh          0/1    Failed     0         4h14m
      elasticsearch-im-infra-27284220-xm8b5          0/1    Succeeded  0         14m
      kibana-57c7d75755-xxxx                        2/2    Running    0         1d
      
      

      $ oc get service 

      cluster-logging-operator-metrics  ClusterIP  172.30.51.195  <none>       8383/TCP,8686/TCP   67d
      collector                         ClusterIP  172.30.13.223  <none>       24231/TCP,2112/TCP  1d
      elasticsearch                     ClusterIP  172.30.15.24   <none>       9200/TCP            150d
      elasticsearch-cluster             ClusterIP  172.30.21.182  <none>       9300/TCP            150d
      elasticsearch-metrics             ClusterIP  172.30.70.112  <none>       60001/TCP           150d
      kibana                            ClusterIP  172.30.249.34  <none>       443/TCP             150d

      Curl output:

      sh-4.4$ curl -kvv http://172.30.51.195:8686/metrics
      *   Trying 172.30.51.195...
      * TCP_NODELAY set
      * connect to 172.30.51.195 port 8686 failed: Connection refused
      * Failed to connect to 172.30.51.195 port 8686: Connection refused
      * Closing connection 0
      curl: (7) Failed to connect to 172.30.51.195 port 8686: Connection refused

      Let me know in case any more furthur details are required.

       

      Attachments

        Issue Links

          Activity

            People

              vparfono Vitalii Parfonov
              vparfono Vitalii Parfonov
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: