-
Story
-
Resolution: Done
-
Major
-
None
-
2
-
False
-
None
-
False
-
Yes
-
-
-
-
-
-
1.11.0-3
-
No
-
No
-
Yes
-
None
-
IDH Sprint 1.11
In a recent incident the MT-SRE team pointed out that the logic for firing alerts for RHODS error budget burn is causing pager fatigue for them. See the linked RCA doc and Slack thread linked therein for more context.
Work with the MT-SRE team to decide on and implement a better set of alerts for when our services (Dashboard and Jupyterhub server) are unavailable.
- relates to
-
RHODS-3068 Ensure correct mapping of alert severity between SOPs, alerts, and pages
- Closed
- mentioned on