Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-642

[2304076] duplicate metrics being produced

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • 4.17.0-117
    • ?
    • Hide
      Cause:
      Prometheus client dependency upgrade caused process level metris (res mem, VSS, heap, etc.) to be collected when custom metrics are collected as well.

      Consequence:
      Duplicate metrics since NooBaa was collecting process level metrics as well as custom metrics. These dups caused an alert in Prometheus

      Fix:
      Remove process-level collections as those are collected now upon custom metrics.

      Result:
      Remove duplication of metrics reporting and the alert in Prometheus
      Show
      Cause: Prometheus client dependency upgrade caused process level metris (res mem, VSS, heap, etc.) to be collected when custom metrics are collected as well. Consequence: Duplicate metrics since NooBaa was collecting process level metrics as well as custom metrics. These dups caused an alert in Prometheus Fix: Remove process-level collections as those are collected now upon custom metrics. Result: Remove duplication of metrics reporting and the alert in Prometheus
    • Bug Fix
    • Approved
    • None

      The following alerts is firing PrometheusDuplicateTimestamps

      Turning on debug logs indicates that the noobaa-mgmt-service-monitor and s3-service-monitor are the issue

      1 ts=2024-08-09T12:35:11.506Z caller=scrape.go:1777 level=debug component="scrape manager" scrape_pool=serviceMonitor/openshift-storage/noobaa-mgmt-service-monitor/0 target=http://10.129.2.15:8080/metrics/web_server msg="Duplicate sample for timestamp" series=NooBaa_health_status

      This can be shown by manually curling the metrics

      oc exec prometheus-k8s-1 – curl "http://10.129.2.15:8080/metrics/web_server" > metrics.txt

      Which indeed is returning duplicate metrics by searching for 'NooBaa_health_status 0' as a example

      Upon investigation this is because of this block
      https://github.com/noobaa/noobaa-core/blob/ad73e9cb3bd483f6f34de9a28a9f4ba3ea060eb3/src/server/analytic_services/prometheus_reporting.js#L44

      If I call /metrics/web_server/nodejs and /metrics/web_server/core seperatly they return the same results.

      So the solution is to either alert the code above or change the service monitor items to append /nodejs onto the end.

      Such as
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
      name: noobaa-mgmt-service-monitor
      labels:
      app: noobaa
      spec:
      endpoints:

      • port: mgmt
        path: /metrics/web_server/nodejs
      • port: mgmt
        path: /metrics/bg_workers
      • port: mgmt
        path: /metrics/hosted_agents
        namespaceSelector: {}
        selector:
        matchLabels:
        noobaa-mgmt-svc: "true"

              rh-ee-achouhan Aayush Chouhan
              ian-demolab Ian Watson (Inactive)
              Nimrod Becker
              Sagi Hirshfeld Sagi Hirshfeld
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: