Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-1856

[RFE] Raise an alert when the cluster status changes to Unknown on the 'Clusters' page

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Observability
    • [RFE] Raise alert when cluster status is Unknown
    • False
    • None
    • False
    • To Do

      Epic Goal

      • ...
        Currently no alerts are raised and there is no way to let the user know if cluster status has changed to 'Unknown' until user explicitly navigates to the 'Clusters' page and checks the status of all the clusters manually. We want to avoid it.

      Why is this important?

      In features like Regional Disaster Recovery, it is important that a cluster admin is notified when any of the clusters go down. It is when the cluster admin would decide to perform a failover operation and ensure that the minimum data is lost and the Pods/PVCs running on the primary cluster could eventually be recovered on the peer cluster. This takes RTO into consideration and the goal is to minimize it, and thus alerting the admin is crucial for this solution to be efficient.

      Scenarios

      1. ...

      1. Install ACM on OCP
      2. Create multicluster hub
      3. Use import cluster method using kubeconfig to import 2 managed OCP clusters.
      4. Once they are ready, power off nodes of any of these clusters and check if an alert is raised so as to notify the cluster admin of any disaster that could have occured.

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      An alert should be raised when the cluster status changes to 'Unknown' on the 'Clusters' page.

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions::

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

        1. ACM-1856.webm
          21.65 MB
          Aman Agrawal

            sberens@redhat.com Scott Berens
            amagrawa@redhat.com Aman Agrawal
            Christine Rizzo Christine Rizzo
            Subbarao Meduri Subbarao Meduri
            Joydeep Banerjee Joydeep Banerjee
            Scott Berens Scott Berens
            Votes:
            1 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: