Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3947

Prometheus alerts paging MT-SRE are calculated for SLO of 98% instead of 99.95%

XMLWordPrintable

    • RHODS 1.12
    • Medium

      Description of problem:

      The RHODS Prometheus alerts that send PagerDuty alerts to MT-SRe are calculated for an SLO of 98%, as that was the SLO defined for RHODS Field Trials

      • SLOs-haproxy_backend_http_responses_total
      • SLOs-probe_success

       

      As the Service Level Objective is 99.95% for RHODS LA, I think these alerts should be updated

       

      Note: I think those rules were generated using this online tool https://promtools.dev/alerts/errors

       

       

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1.  Login to RHODS Prometheus
      2.  Go to Status > Rules
      3. Verify the expression for SLOs-haproxy_backend_http_responses_total and SLOs-probe_success

      Build Details:

      RHODS 1.10.0

       

      Live Build:

      quay.io/lferrnan/rhods-operator-live-catalog:1.11.1-prometheus-slo 

       

      PR:

      https://github.com/red-hat-data-services/odh-deployer/pull/232

        1. prometheus-alerts-new-slo-9995.png
          252 kB
          Jorge Garcia Oncins
        2. prometheus-alerts-sla-rhods-1.10.0.png
          289 kB
          Jorge Garcia Oncins
        3. Screenshot from 2022-06-01 20-00-02.png
          317 kB
          Pablo Felix

              lferrnan@redhat.com Lucas Fernandez Aragon
              rhn-support-jgarciao Jorge Garcia Oncins
              Jorge Garcia Oncins Jorge Garcia Oncins
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: