Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-1970

[spike] Use Insights to collect more data when alerts are firing

    XMLWordPrintable

Details

    • Task
    • Resolution: Done
    • Undefined
    • None
    • None
    • None
    • False
    • False
    • NEW
    • NEW
    • Monitoring - Sprint 213
    • 0

    Description

      MON-1791 concluded that it's ok to use the Insights operator to gather more data. Right now, the operator can collect additional data when a particular alert fires (seeĀ https://github.com/openshift/insights-operator/blob/2b6697e230b098207dc09e5b05ea655ced1cb881/pkg/gatherers/conditional/conditional_gatherer.go#L61-L96). A current example is that Insights would collect the API request count resources when the "APIRemovedInNextEUSReleaseInUse" alert is firing to identify which clients are using API versions that are going to removed in the next EUS release.

      A few examples that come to mind for monitoring are

      • Collect PVC resources when KubePersistentVolumeFillingUp fires with critical severity.
      • Collect the list of down targets from /api/v1/targets when PrometheusTargetSyncFailure fires.

      For each critical alert that can be fired by the monitoring stack, we should identify additional data that could be collected to help the diagnostic.

      DoD:

      • Jira tickets created for all critical alerts.

      Attachments

        Activity

          People

            pnair2 Prashant Balachandran (Inactive)
            spasquie@redhat.com Simon Pasquier
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: