Loading...

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Undefined
Fix Version/s: Logging 5.5.0
Affects Version/s: None
Component/s: Log Storage
Labels:
- collab

Story Points:
3
Blocked:
False
Ready:
False
Epic Link:
Loki Operator GA
Docs QE Status:
NEW
Feature Link:
OBSDA-7 - Adopting Loki as an alternative to Elasticsearch to support more lightweight, easier to manage/operate storage scenarios
QE Status:
VERIFIED
Market:

Sprint:
Logging (LogExp) - Sprint 211, Logging (LogExp) - Sprint 212, Logging (LogExp) - Sprint 214, Logging (LogExp) - Sprint 215

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

As an OpenShift administrator, I want to receive alerts for upcoming or present unhealthy conditions of the Loki cluster so that I can proactively take counter-measurements for recovery.

Acceptance criteria

Define a set of alerts per components for most-common unhealthy conditions of a Loki clusters.
Each alert is registered per PrometheusRule in OpenShift cluster monitoring.
Alerts fire in the OpenShift cluster monitoring alertmanager when their trigger are active.

Notes

Investigate on upstream for grafana provided alerts on Loki cluster health
Investigate if any alerts require custom recording rules to execute complex aggregations.
Compile a document with a list of name, description and purpose of each alert as well as recommended thresholds to activate.
Enhance the document with a list of recommendations on how to aggregate the alerts per path if possible, e.g. ingestion alerts vs. querying alerts
Provide the final list of alerts as a static reconcilable PrometheusRule custom resource per LokiStack instance in the Loki-Operator.
Provide prometheus rules unit tests for the final set of alerts (Example on how to test alerts/rules in https://github.com/openshift/elasticsearch-operator/blob/master/test/files/prometheus-unit-tests/test.yml)
Initial investigation work on what alert types we want to have is here: https://docs.google.com/document/d/1-hJ8l-sQPVBcdCNXUsIF-0FCNbf1F18O/edit
The enhancement proposal document is available here: https://github.com/openshift/enhancements/blob/master/enhancements/cluster-logging/loki-observability.md

is related to

LOG-2314 [loki-operator] Add a rule to run the prometheus rules tests in github action

Closed

LOG-2339 [loki-operator] Lokistack alerts status

Closed

relates to

LOG-1815 Enhancement proposal: Add alerts and rules for operator-managed LokiStack

Closed

Assignee:: Unassigned

Reporter:: Periklis Tsirakidis

QA Contact:: Kabir Bharti

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2021/10/05 5:08 AM

Updated:: 2022/09/09 7:16 AM

Resolved:: 2022/08/10 2:07 PM

Details

Description

Acceptance criteria

Notes

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates