-
Bug
-
Resolution: Unresolved
-
Major
-
4.16.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
MON Sprint 270, MON Sprint 271
-
2
-
Done
-
Release Note Not Required
-
None
-
-
None
-
-
None
Description of problem:
Customer upgraded from OpenShift Container Platform 4.15 to OpenShift Container Platform 4.16.28 and is now seeing "PrometheusDuplicateTimestamp" alerts. Specifically, the following ServiceMonitor is showing duplicate metrics: ts=2025-01-21T00:10:33.052Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-monitoring/openshift-state-metrics/0 target=https://10.125.8.88:8443/metrics msg="Error on ingesting samples with different value but same timestamp" num_dropped=16 When manually querying the endpoint, we can see that the metric "openshift_route_status" is reported multiple times (in the output below, all have the same value, but it seems that sometimes different values are present): ~~~ $ curl --cacert /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt --key /etc/prometheus/secrets/metrics-client-certs/tls.key --cert /etc/prometheus/secrets/metrics-client-certs/tls.crt -k https://10.128.2.7:8443/metrics | sort | uniq -c | sort -n [..] 2 openshift_route_status{namespace="argoant",route="florida-gateway-florida-gateway",status="True",type="Admitted",host="blue.example.com",router_name="default"} 1 2 openshift_route_status{namespace="blue-staging",route="node",status="True",type="Admitted",host="node-blue-staging.example.com",router_name="default"} 1 2 openshift_route_status{namespace="blue-staging",route="noding",status="True",type="Admitted",host="noding-blue-staging.example.com",router_name="default"} 1 2 openshift_route_status{namespace="devspaces-operator",route="devworkspace-che-test-2d0cf27f",status="True",type="Admitted",host="devworkspace-che-test-2d0cf27f-devspaces-operator.example.com",router_name="default"} 1 2 openshift_route_status{namespace="devspaces-operator",route="devworkspace-che-test-33ace4c4",status="True",type="Admitted",host="devworkspace-che-test-33ace4c4-devspaces-operator.example.com",router_name="default"} 1 [..] ~~~
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.16.28
How reproducible:
On customer side
Steps to Reproduce:
1. Upgrade to OpenShift Container Platform 4.16.28 2. Have multiple Routes in the cluster
Actual results:
Observe that the PrometheusDuplicateTimestamp alert is firing due to duplicate metrics
Expected results:
No metrics are duplicated by "openshift-state-metrics"
Additional info:
- Prometheus logs available in Support Case 04037739 - ServiceMonitor output available in Support Case 04037739 - Full openshift-state-metrics "/metrics" output available in Support Case 04037739