Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-5551

Stub epic for KONFLUX-123

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • Stub epic for KONFLUX-123
    • False
    • None
    • False
    • To Do
    • KONFLUX-123 - Konflux Availability SLO

      The expected outcome is to have Konflux service specific availability SLO to track the availability of Konflux as a whole. Each availability SLO will have alerts and dashboards based on it. Then, each triggered alert should have a Standard Operating Procedure (SOP) that a SRE can follow to collect diagnostics, respond to the incident, escalate properly and so on.

      The goal is to build on top of collected availability metrics gathered by KONFLUX-136 and KONFLUX-2066. With that data we can define SLOs for each Konflux service accompanied with the necessary SOPs, escalation policies, alerting rules and dashboarding AppSRE requires.

      In order to achieve this, we have to map relevant Konflux services and define their availability SLOs. The suggested main signal to base the service availability SLO on is konflux_up.

      For each mapped service, check whether it already has monitoring dashboards and alerts created based on the defined availability SLO. If such alerts or dashboards are missing, create them.

      Also, for each alert created, check whether SOPs are defined for that service. Again, if such SOPs are missing, create them.

      You'll find templates for potential stories for this epic in KONFLUX-3355.

      Acceptance criteria:

      • A defined availability SLO for <service / team>
      • Alerts created for the availability SLO
      • dashboards created for the availability SLO
      • SOPs created for the availability SLO

            Unassigned Unassigned
            mftboy Matheus Boy
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: