-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
None
-
False
-
-
-
None
Description of problem:
The customer is encountering "TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability, after investigation in case https://access.redhat.com/support/cases/04109447
it seems the "TargetDown" due to server returned HTTP status 401 Unauthorized, for the detail attachment please check the case #04109447.
# oc -n openshift-monitoring rsh prometheus-k8s-0 # curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-0-targets-active.txt # oc -n openshift-monitoring rsh prometheus-k8s-1 # curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-1-targets-active.txt
cat prometheus-*-targets-active.txt "labels": { "container": "kube-rbac-proxy", "endpoint": "metrics", "instance": "10.248.13.82:9096", "job": "alertmanager-metrics", "namespace": "open-cluster-management-observability", "pod": "observability-alertmanager-1", "service": "alertmanager-metrics" }, "scrapePool": "serviceMonitor/open-cluster-management-observability/alertmanager/0", "scrapeUrl": "https://10.248.13.82:9096/metrics", "globalUrl": "https://10.248.13.82:9096/metrics", "lastError": "server returned HTTP status 401 Unauthorized", "lastScrape": "2025-05-08T09:19:41.074782099Z", "lastScrapeDuration": 0.000753623, "health": "down", "scrapeInterval": "30s", "scrapeTimeout": "10s" }
Edit the the servicemonitor that adding the line of bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token could solve the issue.
# oc -n open-cluster-management-observability edit servicemonitor alertmanager apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: alertmanager namespace: open-cluster-management-observability spec: endpoints: - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token <--- add this line interval: 30s port: metrics scheme: https tlsConfig: caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key serverName: alertmanager-metrics.open-cluster-management-observability.svc selector: matchLabels: app: multicluster-observability-alertmanager-metrics
[Request]: Could you Engineering team help to check whether this "TargetDown" alert root cause is missing bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token settings? then fix the issue?
Version-Release number of selected component (if applicable):
How reproducible:
After installed alertmanager-metrics of namespace open-cluster-management-observability
Steps to Reproduce:
always
Actual results:
"TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability
Expected results:
"Target" of alertmanager-metrics of namespace open-cluster-management-observability could be accessed by the openshift-monitoring prometheus pods.
Additional info:
Related kcs:
100% of the alertmanager-metrics/alertmanager-metrics targets in Namespace NS open-cluster-management-observability namespace have been unreachable for more than 15 minutes.