-
Bug
-
Resolution: Done
-
Normal
-
Logging 5.3.0
-
False
-
False
-
NEW
-
NEW
-
-
Logging (Core) - Sprint 210, Logging (Core) - Sprint 211
Description of Problem: Hello Team, AlertManager starts to throw this alert when cluster-logging was upgraded to 5.3.0-55
Version-Release number of selected component (if applicable):
Server Version: 4.8.17
cluster-logging.5.3.0-55
How Reproducible:
Always
Steps To Reproduce:
- Upgrade to cluster-logging 5.3.0-55
- Alerts will be fired in the alert manager
Additional Information:
Alert Details:
Labels alertname = TargetDown job = cluster-logging-operator-metrics namespace = openshift-logging prometheus = openshift-monitoring/k8s service = cluster-logging-operator-metrics severity = warning Annotations description = 100% of the cluster-logging-operator-metrics/cluster-logging-operator-metrics targets in openshift-logging namespace have been unreachable for more than 15 minutes. This may be a symptom of network connectivity issues, down nodes, or failures within these components. Assess the health of the infrastructure and nodes running these targets and then contact support. summary = Some targets were not reachable from the monitoring server for an extended period of time.
$ oc get pods
cluster-logging-operator-55c7dc97c9-pjmhp 1/1 Running 0 10h
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
collector-xxxx 2/2 Running 0 1d
elasticsearch-cdm-xxxx-1-xxxxx-xxxxx 2/2 Running 0 23h
elasticsearch-cdm-xxxx-2-xxxxx-xxxxx 2/2 Running 0 22h
elasticsearch-cdm-xxxx-3-xxxxx-xxxxx 2/2 Running 0 22h
elasticsearch-im-app-27279375-8frhb 0/1 Failed 0 3d
elasticsearch-im-app-27284220-n9jxk 0/1 Succeeded 0 14m
elasticsearch-im-audit-27283830-nxf5n 0/1 Failed 0 6h44m
elasticsearch-im-audit-27284220-lsslm 0/1 Succeeded 0 14m
elasticsearch-im-infra-27283980-c62sh 0/1 Failed 0 4h14m
elasticsearch-im-infra-27284220-xm8b5 0/1 Succeeded 0 14m
kibana-57c7d75755-xxxx 2/2 Running 0 1d
$ oc get service
cluster-logging-operator-metrics ClusterIP 172.30.51.195 <none> 8383/TCP,8686/TCP 67d
collector ClusterIP 172.30.13.223 <none> 24231/TCP,2112/TCP 1d
elasticsearch ClusterIP 172.30.15.24 <none> 9200/TCP 150d
elasticsearch-cluster ClusterIP 172.30.21.182 <none> 9300/TCP 150d
elasticsearch-metrics ClusterIP 172.30.70.112 <none> 60001/TCP 150d
kibana ClusterIP 172.30.249.34 <none> 443/TCP 150d
Curl output:
sh-4.4$ curl -kvv http://172.30.51.195:8686/metrics
* Trying 172.30.51.195...
* TCP_NODELAY set
* connect to 172.30.51.195 port 8686 failed: Connection refused
* Failed to connect to 172.30.51.195 port 8686: Connection refused
* Closing connection 0
curl: (7) Failed to connect to 172.30.51.195 port 8686: Connection refused
Let me know in case any more furthur details are required.
- is cloned by
-
LOG-2090 After Upgrading to Cluster logging 5.3.0-55 receiving alerts Target Down `cluster-logging-operator`
- Closed
- links to