Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1191

Add Alerts to the Telemetry data collection of hosted clusters

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • 0
    • 0% 0%
    • M
    • 0
    • 0

      Goal

      We want to be able to track alerts that are firing on hosted clusters like we do for all other clusters.

      See tread https://redhat-internal.slack.com/archives/CEG5ZJQ1G/p1708501707940669?thread_ts=1708448823.691509&cid=CEG5ZJQ1G

      alerts are added to the list of metrics t o collect:
      https://github.com/openshift/cluster-monitoring-operator/blob/master/Documentation/data-collection.md

          # alerts are the key summarization of the system state. They are reported
          # via telemetry to assess their value in detecting upgrade failure causes
          # and also to prevent the need to gather large sets of metrics that are
          # already summarized on the cluster.  Reporting alerts also creates an
          # incentive to improve per cluster alerting for the purposes of preventing
          # upgrades from failing for end users.
          #
          # consumers: (@openshift/openshift-team-olm, @openshift/openshift-team-cluster-manager)
          - '{__name__="ALERTS",alertstate="firing"}'
      

      We need to add them as well to HCP metrics collection list

      Why is this important?

      We track all alerts, so that we can have visibility on issues that are happening in clusters. Both internal and external clusters. We need to be able to do that also for hosted clusters.

      Scenarios

      1. ...

      Acceptance Criteria

      • Dev - Has a valid enhancement if necessary
      • CI - MUST be running successfully with tests automated
      • QE - covered in Polarion test plan and tests implemented
      • Release Technical Enablement - Must have TE slides
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      Open questions:

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Technical Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Enhancement merged: <link to meaningful PR or GitHub Issue>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            azaalouk Adel Zaalouk
            sradco Shirly Radco
            Matthew Werner Matthew Werner
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: