-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19.z, 4.20.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Low
-
None
-
None
-
None
-
None
-
Sprint 280
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
enabled UWM
apiVersion: v1
kind: ConfigMap
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
enableUserWorkload: true
and UWM alertmanager
$ oc -n openshift-user-workload-monitoring get cm user-workload-monitoring-config -oyaml ... data: config.yaml: | alertmanager: enabled: true
create user project and deploy prometheusrules that could be fired
$ oc new-project ns1
$ oc apply -f rules.yaml
rules.yaml content
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: example-alert namespace: ns1 spec: groups: - name: example rules: - alert: TestAlert expr: vector(1) labels: severity: none annotations: message: This is an alert meant to ensure that the entire alerting pipeline is functional.
user project alert only could be found in UWM alertmanager, not in platform alertmanager
$ token=`oc create token prometheus-user-workload -n openshift-user-workload-monitoring` $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-main.openshift-monitoring.svc:9094/api/v2/alerts?&filter={alertname="TestAlert"}' | jq [] $ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq [ { "annotations": { "message": "This is an alert meant to ensure that the entire alerting pipeline is functional." }, "endsAt": "2025-07-09T08:16:42.919Z", "fingerprint": "348490d73f8513a0", "receivers": [ { "name": "Default" } ], "startsAt": "2025-07-09T07:57:42.919Z", "status": { "inhibitedBy": [], "mutedBy": [], "silencedBy": [], "state": "active" }, "updatedAt": "2025-07-09T08:12:42.931Z", "generatorURL": "https://console-openshift-console.apps.***.qe.devcluster.openshift.com/monitoring/graph?g0.expr=vector%281%29&g0.tab=1", "labels": { "alertname": "TestAlert", "namespace": "ns1", "severity": "none" } } ]
see from picture, in-platform alert Watchdog and user project alert TestAlert are all firing: https://drive.google.com/file/d/1QujxTWC741DJXbi7-XGcOizrOENg-dYn/view?usp=drive_link
silence Watchdog with in-platform alertmanager API,
$ oc -n openshift-monitoring rsh -c alertmanager alertmanager-main-0 sh-5.1$ amtool silence add 'alertname=Watchdog' --author="nobody" --duration="1h" --comment="silence Watchdog" --alertmanager.url=http://localhost:9093 e7191a2f-5654-45fd-a46b-868c4e6a1d80 sh-5.1$ amtool silence query --alertmanager.url=http://localhost:9093 ID Matchers Ends At Created By Comment e7191a2f-5654-45fd-a46b-868c4e6a1d80 alertname="Watchdog" 2025-07-09 09:06:16 UTC nobody silence Watchdog
silence TestAlert with UWM alertmanager API
$ oc -n openshift-user-workload-monitoring rsh -c alertmanager alertmanager-user-workload-0 sh-5.1$ amtool silence add 'alertname=TestAlert' --author="nobody" --duration="1h" --comment="silence TestAlert" --alertmanager.url=http://localhost:9093 aa678d4f-32d8-42c5-a10f-4cdaa5664c04 sh-5.1$ amtool silence query --alertmanager.url=http://localhost:9093 ID Matchers Ends At Created By Comment aa678d4f-32d8-42c5-a10f-4cdaa5664c04 alertname="TestAlert" 2025-07-09 09:04:08 UTC nobody silence TestAlert
see from picture: https://drive.google.com/file/d/1BlKAq8-ctYW8I-QHgz40ZklxlqQijgjl/view?usp=drive_link, only Watchdog alert shows it's silenced on admin console UI, but user project alert TestAlert is not, it still shows firing. API shows it's silenced
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: Bearer $token" 'https://alertmanager-user-workload.openshift-user-workload-monitoring.svc:9095/api/v2/alerts' | jq [ { "annotations": { "message": "This is an alert meant to ensure that the entire alerting pipeline is functional." }, "endsAt": "2025-07-09T09:09:12.919Z", "fingerprint": "348490d73f8513a0", "receivers": [ { "name": "Default" } ], "startsAt": "2025-07-09T07:57:42.919Z", "status": { "inhibitedBy": [], "mutedBy": [], "silencedBy": [ "1ee96539-da1b-453f-8cff-b81373e22e75" ], "state": "suppressed" }, "updatedAt": "2025-07-09T09:05:12.927Z", "generatorURL": "https://console-openshift-console.apps.**.qe.devcluster.openshift.com/monitoring/graph?g0.expr=vector%281%29&g0.tab=1", "labels": { "alertname": "TestAlert", "namespace": "ns1", "severity": "none" } } ]
not sure if it's UWM alertmanager limitation, file bug to Monitoring first, it seems more like a management console bug
Version-Release number of selected component (if applicable):
checked on 4.19,4.20
How reproducible:
always
Steps to Reproduce:
1. see the descriptions
Actual results:
silence user project alert with UWM alertmanager API, the alert shows firing on admin console
Expected results:
the alert shows silenced on admin console
Additional info:
user project alert could be silenced successfully when silence the alert from admin console