-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.12.0, 4.11.0
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
Rejected
-
MON Sprint 244
-
1
-
Done
-
Bug Fix
-
-
None
-
None
-
None
-
None
Description of problem:
Cluster and userworkload alertmanager instances inadvertenly become peered during upgrade
Version-Release number of selected component (if applicable):
How reproducible:
infrequently - customer observed this on 3 cluster out of 15
Steps to Reproduce:
Deploy userworkload monitoring
~~~
config.yaml: |
enableUserWorkload: true
prometheusK8s:
~~~
Deploy user workload alertmanager
~~~
name: user-workload-monitoring-config
namespace: openshift-user-workload-monitoring
data:
config.yaml: |
alertmanager:
enabled: true
~~~
upgrade the cluster
verify the state of the alertmanager clusters:
~~~
$ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool cluster show -o json --alertmanager.url=http://localhost:9093
~~~
Actual results:
alertmanager show 4 peers
Expected results:
we should have 2 pairs
Additional info:
Mitigation steps: Scaling down one of the alertmanager statefulsets to 0 and then scaling up again restores the expected configuration (i.e. 2 separate alertmanager clusters) - the customer then added networkpolicies to prevent alertmanager gossip between namespaces.