Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-3266

Report alerting rules without namespace label

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • 1
    • False
    • None
    • False
    • NEW
    • NEW
    • MON Sprint 238, MON Sprint 240

      From the alerting guidelines (https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md), alerts should include a namespace label. While we can't enforce this rule statically, we can use the telemetry data to spot after the fact which alerts don't comply with the guidelines and file bugs against the non-compliant operators.

      To find out which alerting rules don't follow the , the steps should look like

      1. Spin up a cluster from the latest stable version
      2. Query the /api/v1/rules endpoint from the Thanos querier service and extract all the product alert names.

      curl https://thanos-querier.../api/v1/rules | jq -cr '.data.groups | map(.rules) | flatten | map(select(.type =="alerting")) | map(.name) | unique |join("|")'
      

      3. From https://telemeter-lts.datahub.redhat.com, extract the list of all product alerts that fired without a namespace label, grouped by minor release.

      count by (alertname,version) (alerts{alertname=~"<insert list>",namespace=""} 
      * on(_id) group_left(version) max by(_id, version) (label_replace(id_version_ebs_account_internal:cluster_subscribed{version=~"4.1(2|3|4).*"}, "version", "$1", "version", "^(4.\\d+).*$")))
      

      DoD:

      • The procedure above is documented in the CMO repository or in rhobs/handbook.
      • OCPBUGS tickets opened against each component that needs to fix their alerts.

              spasquie@redhat.com Simon Pasquier
              spasquie@redhat.com Simon Pasquier
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: