Uploaded image for project: 'Docs for Red Hat Developers'
  1. Docs for Red Hat Developers
  2. RHDEVDOCS-3037

Create warning alerts to prevent users from reaching disk watermark thresholds

    XMLWordPrintable

Details

    • 3
    • Hide
      * This release adds a new alert that warns cluster administrators about the cluster reaching Disk Watermark thresholds. The alert considers the data of the previous hours and uses a linear model to predict whether the cluster will reach the disk watermark threshold in the next 6 hours.
      Currently, some alerts fire if the cluster has already reached disk watermark thresholds. As a result, the cluster administrators would have to take critical steps.
      Thus, these new warning alerts will help cluster administrators get proactive notifications about the cluster reaching the disk watermark threshold much ahead of time to take necessary preventive steps.
      Show
      * This release adds a new alert that warns cluster administrators about the cluster reaching Disk Watermark thresholds. The alert considers the data of the previous hours and uses a linear model to predict whether the cluster will reach the disk watermark threshold in the next 6 hours. Currently, some alerts fire if the cluster has already reached disk watermark thresholds. As a result, the cluster administrators would have to take critical steps. Thus, these new warning alerts will help cluster administrators get proactive notifications about the cluster reaching the disk watermark threshold much ahead of time to take necessary preventive steps.

    Description

      Currently we have alerts that will fire if the customers has already reached disk watermark thresholds. However, that means they would then have critical steps to take.

       

      We should adjust our alerts to give users a (warning) heads up that they would reach a threshold within a given amount of time based on the current trend.

       

      Notes:

      https://prometheus.io/docs/prometheus/latest/querying/functions/#predict_linear

      https://github.com/openshift/elasticsearch-operator/blob/master/files/prometheus_alerts.yml#L47

       

      Acceptance Criteria:

      • We provide a warning that the cluster will reach the low watermark threshold within a reasonable amount of time (6 hrs?)
      • We provide a more severe alert that the cluster will reach the high watermark threshold within a reasonable amount of time (6 hrs?)
      • We provide an actionable entry within the runbook for when the low watermark threshold will be met
      • We provide an actionable entry within the runbook for when the high watermark threshold will be met
      • Ensure that the alerts that currently exist inhibit these new alerts (so that we aren't getting multiple alerts for the same issue)
      • Create an initial unit test to test the linear prediction (since they will require ~1 hr of data to properly fire) https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/
        *

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ewolinet@redhat.com Eric Wolinetz (Inactive)
              Qiaoling Tang Qiaoling Tang
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: