Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-76347

Add design document for alerts_effective_active_at_timestamp_seconds metric

XMLWordPrintable

    • CNV I/U Operators Sprint 282
    • None

      Create a design document for a metric that reports all the pending, firing and silenced alerts after the relabeling.
      Query Thanos and if available also alertmanager for the silences.

      Report the alert start time as the metric value.
      Name can be alerts_effective_active_at_timestamp_seconds.
      Add a rule_id fingerprint label with the alert hash, in order to be able to correlate the metric with the alert from the api call.

      Example:
      alerts_effective_active_at_timestamp_seconds{alertname=.., alertstatus=.., namespace-.., node=.., group=.., component=.., rule_id=..} 
      Perhaps additional labels like region and custom user cluster labels.

      When we query  Thanos/Prometheus rules API: GET <thanos-or-prometheus>/api/v1/rules the response nests alert instances under their parent alerting rule inside groups.
      Use this response to populate the rule_id in the metric.

              sradco Shirly Radco
              sradco Shirly Radco
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: