-
Task
-
Resolution: Done
-
Major
-
None
Currently we have alerts that will fire if the customers has already reached disk watermark thresholds. However, that means they would then have critical steps to take.
We should adjust our alerts to give users a (warning) heads up that they would reach a threshold within a given amount of time based on the current trend.
Notes:
https://prometheus.io/docs/prometheus/latest/querying/functions/#predict_linear
https://github.com/openshift/elasticsearch-operator/blob/master/files/prometheus_alerts.yml#L47
Acceptance Criteria:
- We provide a warning that the cluster will reach the low watermark threshold within a reasonable amount of time (6 hrs?)
- We provide a more severe alert that the cluster will reach the high watermark threshold within a reasonable amount of time (6 hrs?)
- We provide an actionable entry within the runbook for when the low watermark threshold will be met
- We provide an actionable entry within the runbook for when the high watermark threshold will be met
- Ensure that the alerts that currently exist inhibit these new alerts (so that we aren't getting multiple alerts for the same issue)
- Create an initial unit test to test the linear prediction (since they will require ~1 hr of data to properly fire) https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/
*
- is documented by
-
RHDEVDOCS-3037 Create warning alerts to prevent users from reaching disk watermark thresholds
- Closed
- links to