Loading...

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: Observability
Labels:
- support
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Description of problem:

The customer is encountering "TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability, after investigation in case https://access.redhat.com/support/cases/04109447

it seems the "TargetDown" due to server returned HTTP status 401 Unauthorized, for the detail attachment please check the case #04109447.

# oc -n openshift-monitoring rsh prometheus-k8s-0
# curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-0-targets-active.txt
# oc -n openshift-monitoring rsh prometheus-k8s-1
# curl -s 'localhost:9090/api/v1/targets?state=active' > prometheus-1-targets-active.txt

cat prometheus-*-targets-active.txt 

  "labels": {
    "container": "kube-rbac-proxy",
    "endpoint": "metrics",
    "instance": "10.248.13.82:9096",
    "job": "alertmanager-metrics",
    "namespace": "open-cluster-management-observability",
    "pod": "observability-alertmanager-1",
    "service": "alertmanager-metrics"
  },
  "scrapePool": "serviceMonitor/open-cluster-management-observability/alertmanager/0",
  "scrapeUrl": "https://10.248.13.82:9096/metrics",
  "globalUrl": "https://10.248.13.82:9096/metrics",
  "lastError": "server returned HTTP status 401 Unauthorized",
  "lastScrape": "2025-05-08T09:19:41.074782099Z",
  "lastScrapeDuration": 0.000753623,
  "health": "down",
  "scrapeInterval": "30s",
  "scrapeTimeout": "10s"
}

Edit the the servicemonitor that adding the line of bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token could solve the issue.

# oc -n open-cluster-management-observability edit servicemonitor alertmanager
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: alertmanager
  namespace: open-cluster-management-observability
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token    <--- add this line
    interval: 30s
    port: metrics
    scheme: https
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      certFile: /etc/prometheus/secrets/metrics-client-certs/tls.crt
      keyFile: /etc/prometheus/secrets/metrics-client-certs/tls.key
      serverName: alertmanager-metrics.open-cluster-management-observability.svc
  selector:
    matchLabels:
      app: multicluster-observability-alertmanager-metrics

[Request]: Could you Engineering team help to check whether this "TargetDown" alert root cause is missing bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token settings? then fix the issue?

Version-Release number of selected component (if applicable):

How reproducible:

After installed alertmanager-metrics of namespace open-cluster-management-observability

Steps to Reproduce:

always

Actual results:

"TargetDown" alert is shown in alertmanager-metrics of namespace open-cluster-management-observability

Expected results:

"Target" of alertmanager-metrics of namespace open-cluster-management-observability could be accessed by the openshift-monitoring prometheus pods.

Additional info:

Related kcs:

100% of the alertmanager-metrics/alertmanager-metrics targets in Namespace NS open-cluster-management-observability namespace have been unreachable for more than 15 minutes.

https://access.redhat.com/solutions/7105331

links to

100% of the alertmanager-metrics/alertmanager-metrics targets in Namespace NS open-cluster-management-observability namespace have been unreachable for more than 15 minutes.

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates