-
Bug
-
Resolution: Done
-
Critical
-
odf-4.16
-
None
The following alerts is firing PrometheusDuplicateTimestamps
Turning on debug logs indicates that the noobaa-mgmt-service-monitor and s3-service-monitor are the issue
1 ts=2024-08-09T12:35:11.506Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor/0 target=http://10.129.2.15:8080/metrics/web_server msg="Duplicate sample for timestamp" series=NooBaa_health_status
This can be shown by manually curling the metrics
oc exec prometheus-k8s-1 – curl "http://10.129.2.15:8080/metrics/web_server" > metrics.txt
Which indeed is returning duplicate metrics by searching for 'NooBaa_health_status 0' as a example
Upon investigation this is because of this block
https://github.com/noobaa/noobaa-core/blob/ad73e9cb3bd483f6f34de9a28a9f4ba3ea060eb3/src/server/analytic_services/prometheus_reporting.js#L44
If I call /metrics/web_server/nodejs and /metrics/web_server/core seperatly they return the same results.
So the solution is to either alert the code above or change the service monitor items to append /nodejs onto the end.
Such as
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: noobaa-mgmt-service-monitor
labels:
app: noobaa
spec:
endpoints:
- port: mgmt
path: /metrics/web_server/nodejs - port: mgmt
path: /metrics/bg_workers - port: mgmt
path: /metrics/hosted_agents
namespaceSelector: {}
selector:
matchLabels:
noobaa-mgmt-svc: "true"
- blocks
-
DFBUGS-458 [2322896] duplicate metrics being produced
- New
-
DFBUGS-697 [2321231] [GSS] duplicate metrics being produced
- MODIFIED
- relates to
-
DFBUGS-839 rook-ceph-osd-prepare-ocs-deviceset pods produce duplicate metrics
- ASSIGNED
- external trackers