Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-9201

Collect metrics which count the number of results of alarm evaluations

XMLWordPrintable

    • Collect alarming evaluation result counters
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • OBSDA-824Enhance Observability on OpenStack observability components
    • Not Selected
    • Planned
    • Proposed
    • No Docs Impact
    • To Do
    • OBSDA-824 - Enhance Observability on OpenStack observability components
    • Proposed
    • Proposed
    • 67% To Do, 33% In Progress, 0% Done

      Epic Overview

      We should count the number of evaluation results when evaluating alarms in Aodh. That means that we should keep track of how many times alarms were evaluated as "OK", "Alarm" or "Insufficient data". These counters should be exposed in the Aodh api. They should be accessible through the Aodh HTTP API as well as through the CLI with aodhclient.

      Afterwards a functionality to poll for these metrics should be added to the Ceilometer central agent.

      In the end these metrics could be displayed on a dashboard. This dashboard could be one of the fastest ways to notify the users about alarming not working. This could mean an issue in metric collection, transport, storage or retrieval (for example a wrong query in autoscaling heat template). Afterwards users would follow with other troubleshooting steps. Visualization is covered in a different epic.

      Goals

      As a customer these counters warn me about possible issues with alarming (which also means an issue with autoscaling).
      Looking at these counters could also help give a faster support to customers.

      Requirements

      A list of specific needs or objectives that a Epic must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the epic shifts.  If a non MVP requirement slips, it does not shift the epic.

      requirement Notes is Mvp?
      Counters are collected and exposed by Aodh   Yes
      Counters are retrievable by aodhclient in CLI   No
      Counters are polled by Ceilometer and transported to Prometheus   Yes

      (Optional) Use Cases

      • Display the "insufficient data" in a dashboard to visually show that there is an issue with an alarm
      • Configure an Alertmanager alarm to notify the users about unusual growth of the "insufficient data" counter

      Out of Scope

      Inclusion in dashboards. The following Epic should take care of that: OSPRH-7416

              rh-ee-jwysogla Jaromir Wysoglad
              rh-ee-jwysogla Jaromir Wysoglad
              rhos-dfg-cloudops
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: