-
Bug
-
Resolution: Done
-
Critical
-
ACM 2.10.2
-
1
-
False
-
None
-
False
-
-
-
MCO Sprint 26
-
None
Description of problem:
This was noticed when investigating[ this issue|https://issues.redhat.com/browse/OHSS-35716?focusedId=25131120&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-25131120]. Endpointmetrics applies alertmanager config alongside cert related fields in order to enable platform and userworkload alert forwarding to the hub Alertmanager. Endpointmetrics, however, does not trigger a reconciliation based on changes to the CMO config object.
As such, if another process changes the CMO config, we will lose all alert forwarding until endpointmetrics reconciles again. This impacts all ROSA/Managed OpenShift clusters as they intentionally apply changes via a hive syncset every 2hrs to the CMO config object.
Version-Release number of selected component (if applicable):
All versions afaics.
How reproducible:
Steps to Reproduce:
- Create an OCP cluster, register it to a hub
- Note the changes to the CMO cluster-monitoring-config config map in the spoke cluster
- Delete the added Additional additionalAlertmanagerConfigs under prometheusK8s in that config map
Actual results:
- Endpointmetrics will not correct the change
Expected results:
- Endpointmetrics should reapply the alertmanager configuration
Additional info:
- We need to watch for changes on the cluster-monitoring-config object in endpointmetrics
- Reconciliation should only be additive - we should not be removing anything from the configuration
- AdditionalAlertmanagers should be an array that is checked by value (value being the endpoint) to insure if two operators touch the same field, they are not overwritten by appends