Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.15.0
Affects Version/s: 4.12.0, 4.11.0
Component/s: Monitoring
Labels:
None

Severity:
Moderate
Regression:
No
Sprint:
MON Sprint 244
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, instances of Alertmanager for core platform monitoring and for user-defined projects could inadvertently become peered during an upgrade. This issue could occur when multiple Alertmanager instances were deployed in the same cluster. This release fixes the issue by adding a `--cluster.label` flag to Alertmanager that helps to block any traffic that is not intended for the cluster.
(link:https://issues.redhat.com/browse/OCPBUGS-18707[*~~OCPBUGS-18707~~*])

Show
* Previously, instances of Alertmanager for core platform monitoring and for user-defined projects could inadvertently become peered during an upgrade. This issue could occur when multiple Alertmanager instances were deployed in the same cluster. This release fixes the issue by adding a `--cluster.label` flag to Alertmanager that helps to block any traffic that is not intended for the cluster. (link: https://issues.redhat.com/browse/OCPBUGS-18707 [* OCPBUGS-18707 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.15.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:
PX Priority Data:

Description of problem:

Cluster and userworkload alertmanager instances inadvertenly become peered during upgrade

Version-Release number of selected component (if applicable):

How reproducible:

infrequently - customer observed this on 3 cluster out of 15

Steps to Reproduce:

Deploy userworkload monitoring 

~~~
 config.yaml: |
    enableUserWorkload: true
    prometheusK8s:
~~~

Deploy user workload alertmanager  

~~~
  name: user-workload-monitoring-config
  namespace: openshift-user-workload-monitoring
data:
  config.yaml: |
    alertmanager:
      enabled: true 
~~~

upgrade the cluster
verify the state of the alertmanager clusters: 

~~~
$ oc exec -n openshift-monitoring alertmanager-main-0 -- amtool cluster show -o json --alertmanager.url=http://localhost:9093
~~~

Actual results:

alertmanager show 4 peers

Expected results:

we should have 2 pairs

Additional info:

Mitigation steps: 

Scaling down one of the alertmanager statefulsets to 0 and then scaling up again restores the expected configuration (i.e. 2 separate alertmanager clusters)

- the customer then added networkpolicies to prevent alertmanager gossip between namespaces.

links to

openshift/prometheus-operator#255: OCPBUGS-18707: [bot] Bump openshift/prometheus-operator to v0.69.0

RHEA-2023:7198 rpm

Assignee:: Jayapriya Pai

Reporter:: Nigel Smith

QA Contact:: Junqi Zhao

Doc Contact:: Brian Burt

Contributors:: Simon Pasquier

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/09/08 12:12 PM

Updated:: 2024/02/27 8:59 PM

Resolved:: 2024/02/27 8:59 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates