Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-74645

Alertmanager sends inhibited notifications after pod 'alertmanager-main-x' restart

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.20
    • Monitoring
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Upstream issue: https://github.com/prometheus/alertmanager/issues/4064

      Have 2 alerts firing for a long time, and configured the inhibition rule in such a way that one of the alerts inhibits the other one.

      What did you expect to see?
      The Alertmanager does not send a notification for the inhibited alert if I restart/reload it.

      What did you see instead? Under which circumstances?
      The Alertmanager sent a notification for the alert, which should have been inhibited right away once it received the alert from Prometheus.

      Reproducer - going against best practice of inhibiting across rule groups
      Configured against a user workload alertmanager instance - to be the only thing in the logs mainly

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          enableUserWorkload: true
      ---
      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: user-workload-monitoring-config
        namespace: openshift-user-workload-monitoring
      data:
        config.yaml: |
          alertmanager:
            enabled: true
            enableAlertmanagerConfig: true
            logLevel: debug
      ---
      apiVersion: v1
      kind: Namespace
      metadata:
        name: ns1
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        labels:
          app: prometheus-example-app
        name: prometheus-example-app
        namespace: ns1
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: prometheus-example-app
        template:
          metadata:
            labels:
              app: prometheus-example-app
          spec:
            containers:
            - image: ghcr.io/rhobs/prometheus-example-app:0.4.1
              imagePullPolicy: IfNotPresent
              name: prometheus-example-app
      ---
      apiVersion: v1
      kind: Service
      metadata:
        labels:
          app: prometheus-example-app
        name: prometheus-example-app
        namespace: ns1
      spec:
        ports:
        - port: 8080
          protocol: TCP
          targetPort: 8080
          name: web
        selector:
          app: prometheus-example-app
        type: ClusterIP
      ---
      apiVersion: monitoring.coreos.com/v1
      kind: ServiceMonitor
      metadata:
        labels:
          k8s-app: prometheus-example-monitor
        name: prometheus-example-monitor
        namespace: ns1
      spec:
        endpoints:
        - interval: 30s
          port: web
          scheme: http
        selector:
          matchLabels:
            app: prometheus-example-app
      ---
      apiVersion: monitoring.coreos.com/v1         
      kind: PrometheusRule                         
      metadata:                                    
        name: example-alert                        
        namespace: ns1                             
      spec:
        groups:
        - name: Example_group_1
          rules:
          - alert: Inhibiting rule
            expr: version{job="prometheus-example-app"} > 0
            for: 5m
            labels:
              inhibit: "true"
            annotations:
              summary: "This is an inhibiting rule"
          - alert: Inhibited rule
            expr: version{job="prometheus-example-app"} > 0
            for: 5m
            labels:
              inhibited: "true"
            annotations:
              summary: "This is an inhibited rule"
        - name: Example group 2
          rules:
          - alert: Another inhibited rule
            expr: version{job="prometheus-example-app"} > 0
            for: 5m
            labels:
              inhibited: "true"
            annotations:
              summary: "This is another inhibited rule"
      

      AlertmanagerConfig:

      "global":
        "http_config":
          "proxy_from_environment": true
      "inhibit_rules":
       - "target_matchers":
         - "inhibited = true"
         "source_matchers":
         - "inhibit = true"
      "receivers":
      - "name": "test"
      "route":
        "receiver": "test"
        "group_wait": "15s"
        "group_interval": "1m"
        "repeat_interval": "5m"
      

      apply the alertmanger config to the userworkload instance:

      oc -n openshift-user-workload-monitoring create secret generic alertmanager-user-workload --from-file=alertmanager.yaml --dry-run -o=yaml |  oc -n openshift-user-workload-monitoring replace secret --filename=- 
      

      On alertmanager restart debug logs show inhibited alerts becoming active...before being inhibited

      time=2026-02-13T15:33:17.410Z level=INFO source=cluster.go:691 msg="gossip settled; proceeding" component=cluster elapsed=10.003040452s
      time=2026-02-13T15:33:22.412Z level=DEBUG source=net.go:962 msg="[DEBUG] memberlist: Initiating push/pull sync with:  10.128.2.26:9094" component=cluster
      time=2026-02-13T15:33:22.413Z level=DEBUG source=delegate.go:238 msg=NotifyJoin component=cluster node=01KHBT4TSTBJB2W2PGTJAPCXHZ addr=10.128.2.26:9094
      time=2026-02-13T15:33:22.413Z level=DEBUG source=cluster.go:467 msg=success component=cluster msg=refresh addr=10.128.2.26:9094
      time=2026-02-13T15:33:40.569Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Another inhibited rule[8f0c1e1][active]"
      time=2026-02-13T15:33:40.570Z level=DEBUG source=dispatch.go:530 msg=flushing component=dispatcher aggrGroup={}:{} alerts="[Another inhibited rule[8f0c1e1][active]]"
      time=2026-02-13T15:33:51.875Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Inhibiting rule[ab3b3d0][active]"
      time=2026-02-13T15:33:51.876Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Inhibited rule[ef5d27a][active]"
      time=2026-02-13T15:33:51.884Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Inhibiting rule[ab3b3d0][active]"
      time=2026-02-13T15:33:51.885Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Inhibited rule[ef5d27a][active]"
      time=2026-02-13T15:33:55.556Z level=DEBUG source=dispatch.go:165 msg="Received alert" component=dispatcher alert="Another inhibited rule[8f0c1e1][active]"
      time=2026-02-13T15:34:18.781Z level=DEBUG source=net.go:238 msg="[DEBUG] memberlist: Stream connection from=10.128.2.26:43758" component=cluster
      time=2026-02-13T15:34:40.571Z level=DEBUG source=dispatch.go:530 msg=flushing component=dispatcher aggrGroup={}:{} alerts="[Another inhibited rule[8f0c1e1][active] Inhibited rule[ef5d27a][active] Inhibiting rule[ab3b3d0][active]]"
      time=2026-02-13T15:34:40.571Z level=DEBUG source=notify.go:579 msg="Notifications will not be sent for muted alerts" component=dispatcher alerts="[Another inhibited rule[8f0c1e1][active] Inhibited rule[ef5d27a][active]]" reason=inhibition
      

      If the inhibit rule does not span rule groups and the inhibiting rule comes first then the inhibited rule does not fire

      ---
      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        name: example-alert
        namespace: ns1
      spec:
        groups:
        - name: example
          rules:
          - alert: Inhibiting rule
            expr: version{job="prometheus-example-app"} > 0
            for: 5m
            labels:
              inhibit: "true"
            annotations:
              summary: "This is an inhibiting rule"
          - alert: Inhibited rule
            expr: version{job="prometheus-example-app"} > 0
            for: 5m
            labels:
              inhibited: "true"
            annotations:
              summary: "This is an inhibited rule"
      

              rh-ee-amrini Ayoub Mrini
              rhn-support-nigsmith Nigel Smith
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: