Uploaded image for project: 'Managed Service - API'
  1. Managed Service - API
  2. MGDAPI-4485

Create an alert for absent CRO metrics

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • 1.28.0
    • None
    • None
    • RHOAM Sprint 31

      WHAT
      In an effort to help with root cause analysis of CRO alerts ensure that alerts do not fire due to missing or absent metrics.
      Instead create a separate alert that will fire when metrics are absent.
      This will help SRE with root cause analysis. See example here
      Also see absent metrics doc where this topic was discussed with SRE

      HOW
      1) Remove the use of absent from existing CRO alerts.
      2) Create an alert that will fire when CRO metrics are absent. The list of metrics checked in this alert should relate to the same metrics used from point 1)
      3) The severity should be critical as some of the alerts from 1) are critical
      4) Add SOP

      Example of alerts affected - (may not be conclusive)

      • Ratelimit-Service-Redis-RhoamRedisCacheUnavailable
      • Rhsso-Postgres-RhoamPostgresInstanceUnavailable
      • Rhssouser-Postgres-RhoamPostgresInstanceUnavailable
      • Threescale-Backend-Redis-RhoamRedisCacheUnavailable

      TESTS

      • Ensure CRO alerts only fire when the metrics used in those alerts are present and the query can be fully determined.
      • Ensure the absent metrics alert fires when metrics are absent - test by simply stopping CRO for a period of time.

              tdimov@redhat.com Tsvetoslav Dimov (Inactive)
              bgallagh@redhat.com Brian Gallagher
              Carl Kyrillos Carl Kyrillos
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: