Uploaded image for project: 'Hybrid Cloud Console'
  1. Hybrid Cloud Console
  2. RHCLOUD-38689

Review the severity of all our Prometheus alerts

XMLWordPrintable

    • Quality / Stability / Reliability
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • Unset
    • None

      The severity of a Prometheus alert not only describes how important the alert is, but also determines whether the on-call engineers should be paged when the alert is fired.

      The following severities will trigger the on-call process: warning, high, critical. They can only be used with alerts that are properly tested in app-interface and have an associated runbook.

      The vast majority of our alerts have an info or medium severity while some of them should trigger the on-call process. With this ticket, we'll determine which alerts should have their severity increased.

      Acceptance criteria:

      • The severity from each existing alert is reviewed.
      • All alerts that need a higher severity are listed in a spreadsheet with details about whether the alert is tested and a runbook exists.

              glepage@redhat.com Gwenneg Lepage
              glepage@redhat.com Gwenneg Lepage
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: