-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16
-
None
-
Important
-
None
-
False
-
Description of problem:
Prometheus scraping ODF's noobaa controller results in duplicate samples like below:
1 ts=2024-12-12T15:24:49.445Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor/0 target=http://10.x.x.x:8080/metrics/web_server msg="Error on ingesting samples with different value but same timestamp" num_dropped=176 1 ts=2024-12-12T15:24:52.495Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/s3-service-monitor/0 target=http://10.x.x.x:7004/ msg="Error on ingesting samples with different value but same timestamp" num_dropped=148
Version-Release number of selected component (if applicable):
OpenShift 4.16
How reproducible:
Seen on customer's environment, not reproduced in lab at this time
Actual results:
Getting duplicate metrics (possibly it is scraping from multiple replicas of noobaa?)
Expected results:
Get only one copy of metric
Additional info:
This was brought up in previous bugs as it also affected Rook: https://bugzilla.redhat.com/show_bug.cgi?id=2304076 is for 4.17 it was cloned for backport in https://bugzilla.redhat.com/show_bug.cgi?id=2322896 Which was mentioned a possible dupe of: https://bugzilla.redhat.com/show_bug.cgi?id=2321231 It was fixed in 4.16.4 for Rook, and the bugs were closed, but it was not resolved for noobaa Related JIRAs: DFBUGS-458 DFBUGS-697 In DFBUGS-839, it was mentioned that there were 2 separate issues: both noobaa and rook were sending the dupes. My customer sees scrapes from 2 sources, which are duplicated: scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor scrape_pool=serviceMonitor/openshift-storage/s3-service-monitor
Set Severity to Important to match customer case and because original issues/bugs were closed and case has been open for some months.
[OCPBUGS-54320] Duplicate noobaa metrics from ODF result in PrometheusDuplicateTimestamps errors
PX Impact Score | Original: 6057 | New: 6069 |
Component/s | New: Storage [ 12367909 ] | |
Component/s | Original: Monitoring [ 12367700 ] | |
Assignee | Original: Jan Fajerski [ jfajersk@redhat.com ] |
PX Impact Score | New: 6057 |
QA Contact | New: Junqi Zhao [ juzhao ] |
This is cause by two ServiceMonitors exposing the same metric. On ServiceMonitor should drop the duplicate metric.