Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-1082

Create alert for conditions that caused 2023-11-24 CannotRetrieveUpdate WebRCA-#itn-2023-00159

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • OTA 245, OTA 246, OTA 247

      During the incident, only a high severity alert PEHighLatency triggered, AppSRE is not paged, SRE-P got paged via client side alerts.

      Need to create slo burn rate alerts matching slo document

      Compare alert setting from client site, ensure when the service is not available, AppSRE and Cincinnati team should be paged before SRE-P paged.

      RCA document: link

       

      Definition of done:

      • create alerts for slo burn rate matching slo document
      • ensure alerts have executable runbook and working grafana dashboard

              trking W. Trevor King
              rh-ee-dwan Di Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: