-
Bug
-
Resolution: Done
-
Critical
-
None
-
Observability Sprint 2023-08
-
Important
-
No
Description of problem:
While deploying 3000+ SNOs with ACM and ZTP, we have found occasional clusters showing the common-config-policy as noncompliant with the violation being that the cluster-monitoring-config was modified and it seems that OBS has done so via re-rendering the config file through some sort of yaml serializer which outputs unexpected null values. Since the cluster-monitoring-config is simply a yaml file inserted into a configmap as a single string, when ACM policy compares a modified string vs the expected string, it is found noncompliant. Furthermore it appears that some configuration may have been dropped. The fix here should be for OBS to not even re-render the config file and to leave it alone with the annotation exists to prevent obs from rolling out alerting configuration.
Version-Release number of selected component (if applicable):
2.7.0-DOWNSTREAM-2023-01-16-18-27-49
Hub OCP 4.11.19
SNO OCP 4.10.32
How reproducible:
Steps to Reproduce:
- ...
Actual results:
Expected results:
Additional info:
ACM 2.7 large scale testing Run 13 we found cluster sno02400 showed a modified config file:
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno02400/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml
apiVersion: v1
data:
config.yaml: |
alertmanagerMain:
nodeSelector: null
resources: null
tolerations: null
volumeClaimTemplate: null
enableUserWorkload: null
grafana:
nodeSelector: null
tolerations: null
http: null
k8sPrometheusAdapter: null
kubeStateMetrics: null
openshiftStateMetrics: null
prometheusK8s:
additionalAlertManagerConfigs: null
externalLabels: null
logLevel: ""
nodeSelector: null
remoteWrite: null
resources: null
retention: 24h
tolerations: null
volumeClaimTemplate: null
prometheusOperator: null
telemeterClient: null
thanosQuerier: null
kind: ConfigMap
metadata:
creationTimestamp: "2023-01-18T09:01:55Z"
name: cluster-monitoring-config
namespace: openshift-monitoring
resourceVersion: "65919"
uid: c210447a-d5bf-40bb-b2af-a3f1f48ed548
Compare to a non-modified config file:
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00001/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml
apiVersion: v1
data:
config.yaml: |
grafana:
enabled: false
alertmanagerMain:
enabled: false
prometheusK8s:
retention: 24h
kind: ConfigMap
metadata:
creationTimestamp: "2023-01-18T04:59:53Z"
name: cluster-monitoring-config
namespace: openshift-monitoring
resourceVersion: "61261"
uid: ad28cf77-5a2e-48e9-a0e7-5c6330415e70
- is related to
-
ACM-4538 (ACM 2.6) Configuration for Observability needed
-
- Closed
-
-
ACM-5623 (ACM 2.7) Configuration for Observability needed
-
- Closed
-
-
ACM-5624 (ACM 2.8) Configuration for Observability needed
-
- Closed
-
- relates to
-
OCPBUGS-1025 [tracker]cluster-monitoring-config race condition between Observability and du profile
-
- ON_QA
-