-
Bug
-
Resolution: Done
-
Minor
-
None
-
8
-
False
-
False
-
OBSDOCS (May 30-June 20) #237, OBSDOCS (June 20-July 10) #238
Description of problem:
https://docs.openshift.com/container-platform/4.11/monitoring/managing-alerts.html#applying-custom-alertmanager-configuration_managing-alerts
https://access.redhat.com/documentation/en-us/openshift_container_platform/4.11/html/monitoring/managing-alerts#applying-custom-alertmanager-configuration_managing-alerts
A sample "Applying a custom Alertmanager configuration" looks like this:
alertmanager.yaml ``` global: resolve_timeout: 5m route: group_wait: 30s group_interval: 5m <== here repeat_interval: 12h receiver: default routes: - match: alertname: Watchdog repeat_interval: 5m <== here receiver: watchdog ... snip ... ```
This setting has the same value for group_interval and repeat_interval and is known to cause race conditions. So this actually sends alerts randomly at 5 or 10 minutes.This creates unnecessary confusion for customers. Please update avoid race conditions.
Version-Release number of selected component (if applicable):
4.ll
Actual results:
- match: alertname: Watchdog repeat_interval: 5m receiver: watchdog
This sample alerts in 5 or 10 minutes randomly.
Expected results:
Need a sample that correctly alerts me every 5 minutes.
We know from some investigate and customer feedback that neither of the following patterns will trigger alerts exactly every 5 minutes.
- match: alertname: Watchdog repeat_interval: 1h receiver: watchdog
and
- match: alertname: Watchdog group_wait: 30s group_interval: 1m repeat_interval: 5m receiver: watchdog
- If this is an implementation-based limitation, there should at least be a warning in the documentation.
Additional info:
Need QE that make sure race conditions are not occurring
- is cloned by
-
RHDEVDOCS-4896 Clarify meaning of repeat_interval setting in Alertmanager config
- Closed
- links to
1.
|
SME Review | Closed | Brian Burt | ||
2.
|
QE Review | Closed | Brian Burt | ||
3.
|
Peer Review | Closed | Brian Burt |