Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54320

Duplicate noobaa metrics from ODF result in PrometheusDuplicateTimestamps errors

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.16
    • Storage
    • None
    • Important
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Prometheus scraping ODF's noobaa controller results in duplicate samples like below:
            1 ts=2024-12-12T15:24:49.445Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor/0 target=http://10.x.x.x:8080/metrics/web_server msg="Error on ingesting samples with different value but same timestamp" num_dropped=176
            1 ts=2024-12-12T15:24:52.495Z caller=scrape.go:1738 level=warn component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/s3-service-monitor/0 target=http://10.x.x.x:7004/ msg="Error on ingesting samples with different value but same timestamp" num_dropped=148

      Version-Release number of selected component (if applicable):

      OpenShift 4.16    

      How reproducible:

          Seen on customer's environment, not reproduced in lab at this time

      Actual results:

      Getting duplicate metrics   (possibly it is scraping from multiple replicas of noobaa?)

      Expected results:

      Get only one copy of metric

      Additional info:

      This was brought up in previous bugs as it also affected Rook:
      https://bugzilla.redhat.com/show_bug.cgi?id=2304076 is for 4.17
      it was cloned for backport in
      https://bugzilla.redhat.com/show_bug.cgi?id=2322896
      Which was mentioned a possible dupe of:
      https://bugzilla.redhat.com/show_bug.cgi?id=2321231
      
      It was fixed in 4.16.4 for Rook, and the bugs were closed, but it was not resolved for noobaa
      
      Related JIRAs:
      DFBUGS-458
      DFBUGS-697
      
      In DFBUGS-839, it was mentioned that there were 2 separate issues: both noobaa and rook were sending the dupes.  My customer sees scrapes from 2 sources, which are duplicated:
      scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor
      scrape_pool=serviceMonitor/openshift-storage/s3-service-monitor

      Set Severity to Important to match customer case and because original issues/bugs were closed and case has been open for some months.

            [OCPBUGS-54320] Duplicate noobaa metrics from ODF result in PrometheusDuplicateTimestamps errors

            Portfolio Life Cycle Management Automation Bot made changes -
            PX Impact Score Original: 6057 New: 6069
            Jan Fajerski made changes -
            Component/s New: Storage [ 12367909 ]
            Component/s Original: Monitoring [ 12367700 ]
            Assignee Original: Jan Fajerski [ jfajersk@redhat.com ]

            This is cause by two ServiceMonitors exposing the same metric. On ServiceMonitor should drop the duplicate metric.

            Jan Fajerski added a comment - This is cause by two ServiceMonitors exposing the same metric. On ServiceMonitor should drop the duplicate metric.
            Portfolio Life Cycle Management Automation Bot made changes -
            PX Impact Score New: 6057
            Steven Walter made changes -
            QA Contact New: Junqi Zhao [ juzhao ]
            Steven Walter created issue -

              Unassigned Unassigned
              rhn-support-stwalter Steven Walter
              Junqi Zhao Junqi Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: