-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
Improve Integrations alerts and runbooks
-
Quality / Stability / Reliability
-
False
-
-
False
-
Unset
-
In Progress
-
0% To Do, 0% In Progress, 100% Done
-
-
- Alerts need a proper severity. App SRE should be notified or paged when relevant.
- Alerts need an actionable runbook. It should provide all the information required to deal an incident to engineers who are not familiar with a service. In particular, they need to include the customer impact, clear and detailed actions to perform as part of the incident response and possibly contacts or teams to talk to, if relevant.
- Alerts should not create noise in Slack. In case of noise, we should do our best to fix it and not simply ignore it.
- is related to
-
RHCLOUD-40082 Look at the runbook and note in the spreadsheet if we believe it's helpful
-
- New
-
- split from
-
RHCLOUD-34470 Audit Integration services monitoring and observability
-
- Closed
-
- split to
-
RHCLOUD-40918 Improve Integrations runbooks and increase alert severities
-
- In Progress
-