-
Bug
-
Resolution: Done-Errata
-
Undefined
-
4.16
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
MON Sprint 256, MON Sprint 257
-
2
-
In Progress
-
Release Note Not Required
-
-
None
-
None
-
None
-
None
Description of problem:
Version-Release number of selected component (if applicable):
4.16.0-0.nightly-2024-06-13-084629
How reproducible:
100%
Steps to Reproduce:
1.apply configmap
*****
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
remoteWrite:
- url: "http://invalid-remote-storage.example.com:9090/api/v1/write"
queue_config:
max_retries: 1
*****
2. check logs
% oc logs -c prometheus prometheus-k8s-0 -n openshift-monitoring
...
ts=2024-06-14T01:28:01.804Z caller=dedupe.go:112 component=remote level=warn remote_name=5ca657 url=http://invalid-remote-storage.example.com:9090/api/v1/write msg="Failed to send batch, retrying" err="Post \"http://invalid-remote-storage.example.com:9090/api/v1/write\": dial tcp: lookup invalid-remote-storage.example.com on 172.30.0.10:53: no such host"
3.query after 15mins
% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=ALERTS{alertname="PrometheusRemoteStorageFailures"}' | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 145 100 78 100 67 928 797 --:--:-- --:--:-- --:--:-- 1726
{
"status": "success",
"data": {
"resultType": "vector",
"result": [],
"analysis": {}
}
}
% oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://thanos-querier.openshift-monitoring.svc:9091/api/v1/query?' --data-urlencode 'query=prometheus_remote_storage_failures_total' | jq
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 124 100 78 100 46 1040 613 --:--:-- --:--:-- --:--:-- 1653
{
"status": "success",
"data": {
"resultType": "vector",
"result": [],
"analysis": {}
}
}
Actual results:
alert did not triggeted
Expected results:
alert triggered, able to see the alert and metrics
Additional info:
below metrics show as `No datapoints found.` prometheus_remote_storage_failures_total prometheus_remote_storage_samples_dropped_total prometheus_remote_storage_retries_total
`prometheus_remote_storage_samples_failed_total` value is 0
- blocks
-
OCPBUGS-36918 `PrometheusRemoteStorageFailures` alert failed to trigger
-
- Closed
-
- is cloned by
-
OCPBUGS-36918 `PrometheusRemoteStorageFailures` alert failed to trigger
-
- Closed
-
- links to
-
RHEA-2024:3718
OpenShift Container Platform 4.17.z bug fix update