-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.12.0, 4.11.0
-
None
Description of problem:
Cluster and userworkload alertmanager instances inadvertenly become peered during upgrade
Version-Release number of selected component (if applicable):
How reproducible:
infrequently - customer observed this on 3 cluster out of 15
Steps to Reproduce:
Deploy userworkload monitoring ~~~ config.yaml: | enableUserWorkload: true prometheusK8s: ~~~ Deploy user workload alertmanager ~~~ name: user-workload-monitoring-config namespace: openshift-user-workload-monitoring data: config.yaml: | alertmanager: enabled: true ~~~ upgrade the cluster verify the state of the alertmanager clusters: ~~~ $ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool cluster show -o json --alertmanager.url=http://localhost:9093 ~~~
Actual results:
alertmanager show 4 peers
Expected results:
we should have 2 pairs
Additional info:
Mitigation steps: Scaling down one of the alertmanager statefulsets to 0 and then scaling up again restores the expected configuration (i.e. 2 separate alertmanager clusters) - the customer then added networkpolicies to prevent alertmanager gossip between namespaces.