Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-20743

TargetDown alert is shown in alertmanager-metrics of namespace open-cluster-management-observability

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • Observability
    • Quality / Stability / Reliability
    • False
    • None
    • False
    • None

      Description of problem:

      The customer is encountering "TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability, after investigation in case https://access.redhat.com/support/cases/04109447

      it seems the "TargetDown" due to server returned HTTP status 401 Unauthorized, for the detail attachment please check the case #04109447.

      # oc -n openshift-monitoring rsh prometheus-k8s-0
      # curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-0-targets-active.txt
      # oc -n openshift-monitoring rsh prometheus-k8s-1
      # curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-1-targets-active.txt
      
      cat prometheus-*-targets-active.txt 
      
        "labels": {
          "container": "kube-rbac-proxy",
          "endpoint": "metrics",
          "instance": "10.248.13.82:9096",
          "job": "alertmanager-metrics",
          "namespace": "open-cluster-management-observability",
          "pod": "observability-alertmanager-1",
          "service": "alertmanager-metrics"
        },
        "scrapePool": "serviceMonitor/open-cluster-management-observability/alertmanager/0",
        "scrapeUrl": "https://10.248.13.82:9096/metrics",
        "globalUrl": "https://10.248.13.82:9096/metrics",
        "lastError": "server returned HTTP status 401 Unauthorized",
        "lastScrape": "2025-05-08T09:19:41.074782099Z",
        "lastScrapeDuration": 0.000753623,
        "health": "down",
        "scrapeInterval": "30s",
        "scrapeTimeout": "10s"
      }

      Edit the the servicemonitor that adding the line of bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token could solve the issue.

      # oc -n open-cluster-management-observability edit servicemonitor alertmanager
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        name: alertmanager
        namespace: open-cluster-management-observability
      spec:
        endpoints:
        - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token    <--- add this line
          interval: 30s
          port: metrics
          scheme: https
          tlsConfig:
            caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
            certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
            keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
            serverName: alertmanager-metrics.open-cluster-management-observability.svc
        selector:
          matchLabels:
            app: multicluster-observability-alertmanager-metrics 

       

      [Request]: Could you Engineering team help to check whether this "TargetDown" alert root cause is missing bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token settings? then fix the issue?

      Version-Release number of selected component (if applicable):

      How reproducible:

      After installed alertmanager-metrics of namespace open-cluster-management-observability

      Steps to Reproduce:

      always

      Actual results:

      "TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability

      Expected results:

      "Target" of alertmanager-metrics of namespace open-cluster-management-observability could be accessed by the openshift-monitoring prometheus pods.

      Additional info:

      Related kcs:

      100% of the alertmanager-metrics/alertmanager-metrics targets in Namespace NS open-cluster-management-observability namespace have been unreachable for more than 15 minutes.

      https://access.redhat.com/solutions/7105331

              rh-ee-jachanse Jacob Baungard Hansen
              rhn-support-jiewu Jie Wu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: