Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-18751

[Dev Preview] Incident Management UI in ACM

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • ACM 2.14.0
    • Observability
    • None
    • [Dev Preview] Incident Management UI in ACM
    • Product / Portfolio Work
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • To Do

      OCP/Telco Definition of Done
      https://docs.google.com/document/d/1TP2Av7zHXz4_fmeX4q9HB0m9cqSZ4F6Jd4AiVoaF_2s/edit#heading=h.gaa58bzbvwde
      Epic Template descriptions and documentation.
      https://docs.google.com/document/d/14CUCEg6hQ_jpsFzJtWo29GfFVWmun2Uivrxq3_Fkgdg/edit
      ACM-wide Product Requirements (Top-level Epics)
      https://docs.google.com/document/d/1uIp6nS2QZ766UFuZBaC9USs8dW_I5wVdtYF9sUObYKg/edit

      Epic Goal

      The goal of this epic is to introduce the Incident Detection feature in the MCO & MCOA/multicluster observability in the ACM. The plan is to introduce this feature as a dev preview in 2.14

      ...

      Why is this important?

      Dealing with the amount of active alerts in a cluster can be challenging. Incident detection groups alerts that occur around the same time into incidents, helping you to focus on identifying the root cause of alert spikes rather than managing numerous individual alerts. The timeline of incidents is available in the OCP webconsole ("Observe" -> "Incidents") of the spoke clusters.

      In the multicluster observability environment the plan is to introduce a Grafana/Perses dashboard with a fleet-level overview of the incidents.

      Scenarios

      Acceptance Criteria

      Dependencies (internal and external)

      Previous Work (Optional):

      The grouping of the alerts is implemented in the spoke cluster by the cluster-health-analyzer. The cluster-health-analyzer is part of the cluster-observability-operator (since version 1.0.0) and this operator is de facto the required installation/deployment procedure for this feature.

      Open questions:

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub
        Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Doc issue opened with a completed template. Separate doc issue
        opened for any deprecation, removal, or any current known
        issue/troubleshooting removal from the doc, if applicable.
      • Considerations were made for Extended Update Support (EUS)

              tremes1@redhat.com Tomas Remes
              mzardab@redhat.com Moad Zardab
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: