Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59131

silence user project alert with UWM alertmanager API, the alert still shows firing on admin console

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.19.z, 4.20.0
    • Observability UI
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Low
    • None
    • None
    • None
    • None
    • Sprint 280
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      enabled UWM

      apiVersion: v1
      kind: ConfigMap
      metadata:
        name: cluster-monitoring-config
        namespace: openshift-monitoring
      data:
        config.yaml: |
          enableUserWorkload: true

      and UWM alertmanager

      $ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml
      ...
      data:
        config.yaml: |
          alertmanager:
            enabled: true

      create user project and deploy prometheusrules that could be fired

      $ oc new-project ns1
      $ oc apply -f rules.yaml 

      rules.yaml content

      apiVersion: monitoring.coreos.com/v1
      kind: PrometheusRule
      metadata:
        name: example-alert
        namespace: ns1
      spec:
        groups:
        - name: example
          rules:
          - alert: TestAlert
            expr: vector(1)
            labels:
              severity: none
            annotations:
              message: This is an alert meant to ensure that the entire alerting pipeline is functional. 

      user project alert only could be found in UWM alertmanager, not in platform alertmanager

      $ token=`oc create token prometheus-user-workload -n openshift-user-workload-monitoring`
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="TestAlert"}' | jq
      []
      
      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
      [
        {
          "annotations": {
            "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
          },
          "endsAt": "2025-07-09T08:16:42.919Z",
          "fingerprint": "348490d73f8513a0",
          "receivers": [
            {
              "name": "Default"
            }
          ],
          "startsAt": "2025-07-09T07:57:42.919Z",
          "status": {
            "inhibitedBy": [],
            "mutedBy": [],
            "silencedBy": [],
            "state": "active"
          },
          "updatedAt": "2025-07-09T08:12:42.931Z",
          "generatorURL": "https://console-openshift-console.apps.***.qe.devcluster.openshift.com/monitoring/graph?g0.expr=vector%281%29&g0.tab=1",
          "labels": {
            "alertname": "TestAlert",
            "namespace": "ns1",
            "severity": "none"
          }
        }
      ]

      see from picture, in-platform alert Watchdog and user project alert TestAlert are all firing: https://drive.google.com/file/d/1QujxTWC741DJXbi7-XGcOizrOENg-dYn/view?usp=drive_link

      silence Watchdog with in-platform alertmanager API,

      $ oc -n openshift-monitoring rsh -c alertmanager alertmanager-main-0
      sh-5.1$ amtool silence add 'alertname=Watchdog' --author="nobody" --duration="1h"  --comment="silence Watchdog" --alertmanager.url=http://localhost:9093
      e7191a2f-5654-45fd-a46b-868c4e6a1d80
      sh-5.1$ amtool silence query --alertmanager.url=http://localhost:9093
      ID                                    Matchers              Ends At                  Created By  Comment           
      e7191a2f-5654-45fd-a46b-868c4e6a1d80  alertname="Watchdog"  2025-07-09 09:06:16 UTC  nobody      silence Watchdog 

      silence TestAlert with UWM alertmanager API

      $ oc -n openshift-user-workload-monitoring rsh -c alertmanager alertmanager-user-workload-0
      sh-5.1$ amtool silence add 'alertname=TestAlert' --author="nobody" --duration="1h"  --comment="silence TestAlert" --alertmanager.url=http://localhost:9093
      aa678d4f-32d8-42c5-a10f-4cdaa5664c04
      sh-5.1$ amtool silence query --alertmanager.url=http://localhost:9093
      ID                                    Matchers               Ends At                  Created By  Comment            
      aa678d4f-32d8-42c5-a10f-4cdaa5664c04  alertname="TestAlert"  2025-07-09 09:04:08 UTC  nobody      silence TestAlert 

      see from picture: https://drive.google.com/file/d/1BlKAq8-ctYW8I-QHgz40ZklxlqQijgjl/view?usp=drive_link, only Watchdog alert shows it's silenced on admin console UI, but user project alert TestAlert is not, it still shows firing. API shows it's silenced

      $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq
      [
        {
          "annotations": {
            "message": "This is an alert meant to ensure that the entire alerting pipeline is functional."
          },
          "endsAt": "2025-07-09T09:09:12.919Z",
          "fingerprint": "348490d73f8513a0",
          "receivers": [
            {
              "name": "Default"
            }
          ],
          "startsAt": "2025-07-09T07:57:42.919Z",
          "status": {
            "inhibitedBy": [],
            "mutedBy": [],
            "silencedBy": [
              "1ee96539-da1b-453f-8cff-b81373e22e75"
            ],
            "state": "suppressed"
          },
          "updatedAt": "2025-07-09T09:05:12.927Z",
          "generatorURL": "https://console-openshift-console.apps.**.qe.devcluster.openshift.com/monitoring/graph?g0.expr=vector%281%29&g0.tab=1",
          "labels": {
            "alertname": "TestAlert",
            "namespace": "ns1",
            "severity": "none"
          }
        }
      ] 

      not sure if it's UWM alertmanager limitation, file bug to Monitoring first, it seems more like a management console bug

      Version-Release number of selected component (if applicable):

      checked on 4.19,4.20

      How reproducible:

      always

      Steps to Reproduce:

      1. see the descriptions
          

      Actual results:

      silence user project alert with UWM alertmanager API, the alert shows firing on admin console

      Expected results:

      the alert shows silenced on admin console

      Additional info:

      user project alert could be silenced successfully when silence the alert from admin console

              gbernal@redhat.com Gabriel Bernal
              juzhao@redhat.com Junqi Zhao
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: