-
Epic
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
Collect alarming evaluation result counters
-
5
-
False
-
-
False
-
OBSDA-824Enhance Observability on OpenStack observability components
-
Not Selected
-
Planned
-
Proposed
-
No Docs Impact
-
To Do
-
OBSDA-824 - Enhance Observability on OpenStack observability components
-
Proposed
-
Proposed
-
67% To Do, 33% In Progress, 0% Done
-
-
Epic Overview
We should count the number of evaluation results when evaluating alarms in Aodh. That means that we should keep track of how many times alarms were evaluated as "OK", "Alarm" or "Insufficient data". These counters should be exposed in the Aodh api. They should be accessible through the Aodh HTTP API as well as through the CLI with aodhclient.
Afterwards a functionality to poll for these metrics should be added to the Ceilometer central agent.
In the end these metrics could be displayed on a dashboard. This dashboard could be one of the fastest ways to notify the users about alarming not working. This could mean an issue in metric collection, transport, storage or retrieval (for example a wrong query in autoscaling heat template). Afterwards users would follow with other troubleshooting steps. Visualization is covered in a different epic.
Goals
As a customer these counters warn me about possible issues with alarming (which also means an issue with autoscaling).
Looking at these counters could also help give a faster support to customers.
Requirements
A list of specific needs or objectives that a Epic must deliver to satisfy the Feature.. Some requirements will be flagged as MVP. If an MVP gets shifted, the epic shifts. If a non MVP requirement slips, it does not shift the epic.
requirement | Notes | is Mvp? |
Counters are collected and exposed by Aodh | Yes | |
Counters are retrievable by aodhclient in CLI | No | |
Counters are polled by Ceilometer and transported to Prometheus | Yes |
(Optional) Use Cases
- Display the "insufficient data" in a dashboard to visually show that there is an issue with an alarm
- Configure an Alertmanager alarm to notify the users about unusual growth of the "insufficient data" counter
Out of Scope
Inclusion in dashboards. The following Epic should take care of that: OSPRH-7416