-
Epic
-
Resolution: Done
-
Normal
-
None
-
ACM 2.14.0
-
None
-
[Dev Preview] Incident Management UI in ACM
-
Product / Portfolio Work
-
False
-
-
False
-
Not Selected
-
To Do
OCP/Telco Definition of Done
https://docs.google.com/document/d/1TP2Av7zHXz4_fmeX4q9HB0m9cqSZ4F6Jd4AiVoaF_2s/edit#heading=h.gaa58bzbvwde
Epic Template descriptions and documentation.
https://docs.google.com/document/d/14CUCEg6hQ_jpsFzJtWo29GfFVWmun2Uivrxq3_Fkgdg/edit
ACM-wide Product Requirements (Top-level Epics)
https://docs.google.com/document/d/1uIp6nS2QZ766UFuZBaC9USs8dW_I5wVdtYF9sUObYKg/edit
Epic Goal
The goal of this epic is to introduce the Incident Detection feature in the MCO & MCOA/multicluster observability in the ACM. The plan is to introduce this feature as a dev preview in 2.14
...
Why is this important?
Dealing with the amount of active alerts in a cluster can be challenging. Incident detection groups alerts that occur around the same time into incidents, helping you to focus on identifying the root cause of alert spikes rather than managing numerous individual alerts. The timeline of incidents is available in the OCP webconsole ("Observe" -> "Incidents") of the spoke clusters.
In the multicluster observability environment the plan is to introduce a Grafana/Perses dashboard with a fleet-level overview of the incidents.
Scenarios
Acceptance Criteria
Dependencies (internal and external)
Previous Work (Optional):
The grouping of the alerts is implemented in the spoke cluster by the cluster-health-analyzer. The cluster-health-analyzer is part of the cluster-observability-operator (since version 1.0.0) and this operator is de facto the required installation/deployment procedure for this feature.
Open questions:
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
Issue> - DEV - Upstream documentation merged: <link to meaningful PR or GitHub
Issue> - DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Doc issue opened with a completed template. Separate doc issue
opened for any deprecation, removal, or any current known
issue/troubleshooting removal from the doc, if applicable. - Considerations were made for Extended Update Support (EUS)
- is duplicated by
-
ACM-12478 [Dev Preview] ACM Console Incident Overview Dashboard
-
- Closed
-
- links to