Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-3256

Improve scrape sample alerts

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • scrape sample limit alerts
    • False
    • None
    • False
    • Not Selected
    • NEW
    • To Do
    • NEW
    • 0% To Do, 0% In Progress, 100% Done

      Our documentation suggests creating an alert after configuring scrape sample limits.

      That PrometheusRule object has two alerts configured within it [1]

      `ApproachingEnforcedSamplesLimit` 

      `TargetDown` 

      The `Targetdown` alert is designed to fire after the `ApproachingEnforcedSamplesLimit` because the target is dropped once the enforced sample limit is reached

      The TargetDown alert is creating false positives - its firing for reasons other than pods in the namespace have reached there enforced sample limit (e.g. the metrics endpoint may be down). 

      User-defined monitoring should provide out-of-the-box metrics that will help with troubleshooting:

      • Update Prometheus user-workload to enable additional scrape metrics [2]
      • Rewrite the ApproachingEnforcedSamplesLimit alert expression in the OCP documentation like "(scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9" (which reads as "alert when the number of ingested samples reaches 90% of the configured limit).
      • Document how a user would know that a target has hit the limit (e.g. the Targets page should have the information).

      [1] - https://docs.openshift.com/container-platform/4.12/monitoring/configuring-the-monitoring-stack.html#creating-scrape-sample-alerts_configuring-the-monitoring-stack 

      [2] - https://prometheus.io/docs/prometheus/latest/feature_flags/#extra-scrape-metrics

        1. alerting_tab.png
          alerting_tab.png
          323 kB
        2. ApproachingEnforcedSampleLimit-alertfiring.png
          ApproachingEnforcedSampleLimit-alertfiring.png
          328 kB
        3. ApproachingEnforcedSamplesLimit.png
          ApproachingEnforcedSamplesLimit.png
          364 kB
        4. pending_alerts.png
          pending_alerts.png
          376 kB
        5. query_result_greater_than_zero.png
          query_result_greater_than_zero.png
          207 kB
        6. query_result_without_filtering.png
          query_result_without_filtering.png
          385 kB
        7. sample_limit_exceeded.png
          sample_limit_exceeded.png
          194 kB
        8. sample_scrape_limit.png
          sample_scrape_limit.png
          346 kB
        9. user-alerts.png
          user-alerts.png
          323 kB
        10. user-alerts-firing.png
          user-alerts-firing.png
          303 kB

              janantha@redhat.com Jayapriya Pai
              rh-ee-rfloren Roger Florén
              Tai Gao Tai Gao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: