Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Major
Fix Version/s: Logging 5.1
Affects Version/s: None
Component/s: Log Storage
Labels:
- rn-done-resolved
- tc-approved

Story Points:
2
Blocked:
False
Ready:
False
Epic Link:
[ES] Deep Insights
Docs QE Status:
NEW
QE Status:
NEW
Release Note Text:

Hide
This release adds the following new `ElasticsearchNodeDiskWatermarkReached` warnings to the OpenShift Elasticsearch Operator (EO):
- Elasticsearch Node Disk Low Watermark Reached
- Elasticsearch Node Disk High Watermark Reached
- Elasticsearch Node Disk Flood Watermark Reached

The EO issues these warnings when it predicts that an Elasticsearch node will reach the `Disk Low Watermark`, `Disk High Watermark`, or `Disk Flood Stage Watermark` thresholds in the next 6 hours. This warning period gives you time to respond before the node reaches the disk watermark thresholds. The warning messages also provide links to the troubleshooting steps, which you can follow to help mitigate the issue. The EO applies the past several hours of disk space data to a linear model to generate these warnings.

Show
This release adds the following new `ElasticsearchNodeDiskWatermarkReached` warnings to the OpenShift Elasticsearch Operator (EO): - Elasticsearch Node Disk Low Watermark Reached - Elasticsearch Node Disk High Watermark Reached - Elasticsearch Node Disk Flood Watermark Reached The EO issues these warnings when it predicts that an Elasticsearch node will reach the `Disk Low Watermark`, `Disk High Watermark`, or `Disk Flood Stage Watermark` thresholds in the next 6 hours. This warning period gives you time to respond before the node reaches the disk watermark thresholds. The warning messages also provide links to the troubleshooting steps, which you can follow to help mitigate the issue. The EO applies the past several hours of disk space data to a linear model to generate these warnings.
Market:

Sprint:
Logging (LogExp) - Sprint 201

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Currently we have alerts that will fire if the customers has already reached disk watermark thresholds. However, that means they would then have critical steps to take.

We should adjust our alerts to give users a (warning) heads up that they would reach a threshold within a given amount of time based on the current trend.

Notes:

https://prometheus.io/docs/prometheus/latest/querying/functions/#predict_linear

https://github.com/openshift/elasticsearch-operator/blob/master/files/prometheus_alerts.yml#L47

Acceptance Criteria:

We provide a warning that the cluster will reach the low watermark threshold within a reasonable amount of time (6 hrs?)
We provide a more severe alert that the cluster will reach the high watermark threshold within a reasonable amount of time (6 hrs?)
We provide an actionable entry within the runbook for when the low watermark threshold will be met
We provide an actionable entry within the runbook for when the high watermark threshold will be met
Ensure that the alerts that currently exist inhibit these new alerts (so that we aren't getting multiple alerts for the same issue)
Create an initial unit test to test the linear prediction (since they will require ~1 hr of data to properly fire) https://prometheus.io/docs/prometheus/latest/configuration/unit_testing_rules/
*

is documented by

RHDEVDOCS-3037 Create warning alerts to prevent users from reaching disk watermark thresholds

Closed

links to

openshift/elasticsearch-operator#694: Log-1100: Warning alerts to prevent users from reaching disk watermark thresholds

Assignee:: Sashank Agarwal (Inactive)

Reporter:: Eric Wolinetz (Inactive)

QA Contact:: Qiaoling Tang

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2021/02/11 12:46 PM

Updated:: 2022/03/16 3:27 PM

Resolved:: 2021/05/05 4:06 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates