Uploaded image for project: 'Observability Documentation'
  1. Observability Documentation
  2. OBSDOCS-977

Document improved scrape sample alerts

XMLWordPrintable

    • OBSDOCS (May 6 - May 28) #254

      Our documentation suggests creating an alert after configuring scrape sample limits.

      That PrometheusRule object has two alerts configured within it [1]

      `ApproachingEnforcedSamplesLimit` 

      `TargetDown` 

      The `Targetdown` alert is designed to fire after the `ApproachingEnforcedSamplesLimit` because the target is dropped once the enforced sample limit is reached

      The TargetDown alert is creating false positives - its firing for reasons other than pods in the namespace have reached there enforced sample limit (e.g. the metrics endpoint may be down). 

      User-defined monitoring should provide out-of-the-box metrics that will help with troubleshooting:

      • Update Prometheus user-workload to enable additional scrape metrics [2]
      • Rewrite the ApproachingEnforcedSamplesLimit alert expression in the OCP documentation like "(scrape_samples_post_metric_relabeling / (scrape_sample_limit > 0)) > 0.9" (which reads as "alert when the number of ingested samples reaches 90% of the configured limit).
      • Document how a user would know that a target has hit the limit (e.g. the Targets page should have the information).

      [1] - https://docs.openshift.com/container-platform/4.12/monitoring/configuring-the-monitoring-stack.html#creating-scrape-sample-alerts_configuring-the-monitoring-stack 

      [2] - https://prometheus.io/docs/prometheus/latest/feature_flags/#extra-scrape-metrics

        1. alerting_tab.png
          alerting_tab.png
          323 kB
        2. ApproachingEnforcedSampleLimit-alertfiring.png
          ApproachingEnforcedSampleLimit-alertfiring.png
          328 kB
        3. ApproachingEnforcedSamplesLimit.png
          ApproachingEnforcedSamplesLimit.png
          364 kB
        4. pending_alerts.png
          pending_alerts.png
          376 kB
        5. query_result_greater_than_zero.png
          query_result_greater_than_zero.png
          207 kB
        6. query_result_without_filtering.png
          query_result_without_filtering.png
          385 kB
        7. sample_limit_exceeded.png
          sample_limit_exceeded.png
          194 kB
        8. sample_scrape_limit.png
          sample_scrape_limit.png
          346 kB
        9. user-alerts.png
          user-alerts.png
          323 kB
        10. user-alerts-firing.png
          user-alerts-firing.png
          303 kB

            eromanov@redhat.com Eliska Romanova
            rhn-support-bburt Brian Burt
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: