-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.16
-
None
-
Moderate
-
No
-
MON Sprint 256, MON Sprint 257
-
2
-
False
-
-
-
Release Note Not Required
-
In Progress
Description of problem:
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-06-13-084629
How reproducible:
100%
Steps to Reproduce:
1.apply configmap ***** apiVersion: v1 kind: ConfigMap metadata: name: cluster-monitoring-config namespace: openshift-monitoring data: config.yaml: | prometheusK8s: remoteWrite: - url: "http://invalid-remote-storage.example.com:9090/api/v1/write" queue_config: max_retries: 1 ***** 2. check logs % oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring ... ts=2024-06-14T01:28:01.804Z caller=dedupe.go:112 component=remote level=warn remote_name=5ca657 url=http://invalid-remote-storage.example.com:9090/api/v1/write msg="Failed to send batch, retrying" err="Post \"http://invalid-remote-storage.example.com:9090/api/v1/write\": dial tcp: lookup invalid-remote-storage.example.com on 172.30.0.10:53: no such host" 3.query after 15mins % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="PrometheusRemoteStorageFailures"}' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 145 100 78 100 67 928 797 --:--:-- --:--:-- --:--:-- 1726 { "status": "success", "data": { "resultType": "vector", "result": [], "analysis": {} } } % oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=prometheus_remote_storage_failures_total' | jq % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 124 100 78 100 46 1040 613 --:--:-- --:--:-- --:--:-- 1653 { "status": "success", "data": { "resultType": "vector", "result": [], "analysis": {} } }
Actual results:
alert did not triggeted
Expected results:
alert triggered, able to see the alert and metrics
Additional info:
below metrics show as `No datapoints found.` prometheus_remote_storage_failures_total prometheus_remote_storage_samples_dropped_total prometheus_remote_storage_retries_total
`prometheus_remote_storage_samples_failed_total` value is 0
- blocks
-
OCPBUGS-36918 `PrometheusRemoteStorageFailures` alert failed to trigger
- Closed
- is cloned by
-
OCPBUGS-36918 `PrometheusRemoteStorageFailures` alert failed to trigger
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update