Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3443

Refine alerting rules and severity for RHODS error burn rates

XMLWordPrintable

    • IDH Sprint 1.11

      In a recent incident the MT-SRE team pointed out that the logic for firing alerts for RHODS error budget burn is causing pager fatigue for them. See the linked RCA doc and Slack thread linked therein for more context.

      Work with the MT-SRE team to decide on and implement a better set of alerts for when our services (Dashboard and Jupyterhub server) are unavailable.

              aasthana@redhat.com Anish Asthana
              acorvin@redhat.com Alex Corvin
              Jorge Garcia Oncins Jorge Garcia Oncins
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: