Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-16

AlertManager: Toggle ability to - send resolved notifications for inhibited alerts

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • None
    • PM Monitoring
    • None
    • False
    • False

      Problem

      Alerts that have fired, become inhibited and are resolved when in the inhibited state do not send resolved status's to external receivers. This results in orphaned alerts in external receivers like pagerduty. At scale this becomes great source of toil as each alert requires manual investigation to be safely manually resolved. 

       

      Example flow:

      Prom = Prometheus
      AM = Alertmanager
      PD= PagerDuty

      1. Prom: alert1 fires
        1. AM gets alert1, routes to PD
        2. PD receives alert1
      2. Prom: alert2 fires
        1. AM gets alert2
        2. AM suppresses alert1
        3. AM routes alert2 to PD
        4. PD receives alert2
        5. PD now has two alerts
          1. Alert1
          2. Alert2
      3. Prom: alert1 resolves
        1. AM resolves alert1
        2. PD receives no notification as the alert is suppressed
        3. PD: alert1 becomes orphaned
      4. Prom: alert2 resolves
        1. AM resolves alert2
        2. PD receives resolved notification for alert2
        3. PD resolves alert
        4. PD retains alert1 that is now orphaned

      This is undesirable at scale where our pager/dashboards are becoming overwhelmed with orphaned alerts that can require extensive investigation to manually mark as resolved.

       

      What did you expect to see?

      When an alert that has fired and has been routed externally becomes inhibited, its resolved signal bypasses inhibition to notify the external source that it is resolved.

       


       

      Feature Request

      Add an additional boolean configuration option similer to that of send_resolved.

      # Whether or not to notify about resolved alerts.
      [ send_resolved: <boolean> | default = true ]
      # whether or not to notify about resolved alerts whom were firing prior to being inhibited
      [inhibited_send_resolved: <boolean> | default = false ] 

       

      Desired outcome:

      Any alert that has fired and been sent externally should be able to send their resolved status regardless of that alerts state (active,supressed)

      How it could work:

      1. alert1 fires
      2. AM sends external notification
      3. alert2 fires
      4. AM sends alert2 external notification
      5. AM inhibits alert1
      6. alert1 resolves

      Here we could explore two options:

      1. If alert1 resolves AND notified externally prior to becoming inhibted:
        1. -> send resolve regardless of inhibited status.
        2. ^ de-couples alert state
      2. If alert1 resolves AND notifiied externally prior to becoming inhibited:
        1. -> store resolved state in MEM
        2. -> send resolved state when inhibiter resolves
        3. ^ coupled alert state

      Not knowing the codebase well, I prefer 1 as it is a truer representation of the state of the system.

       

      An upstream request has been made here: https://github.com/prometheus/alertmanager/issues/2754 

              rh-ee-rfloren Roger Florén
              dofinn Dominic Finn (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: