Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-2949

Observability writing null config values to DU profile maintained configuration causes policies to show non-compliant

XMLWordPrintable

    • Observability Sprint 2023-08
    • Important

      Description of problem:

      While deploying 3000+ SNOs with ACM and ZTP, we have found occasional clusters showing the common-config-policy as noncompliant with the violation being that the cluster-monitoring-config was modified and it seems that OBS has done so via re-rendering the config file through some sort of yaml serializer which outputs unexpected null values.  Since the cluster-monitoring-config is simply a yaml file inserted into a configmap as a single string, when ACM policy compares a modified string vs the expected string, it is found noncompliant.  Furthermore it appears that some configuration may have been dropped.  The fix here should be for OBS to not even re-render the config file and to leave it alone with the annotation exists to prevent obs from rolling out alerting configuration.

      Version-Release number of selected component (if applicable):

       2.7.0-DOWNSTREAM-2023-01-16-18-27-49
      Hub OCP 4.11.19
      SNO OCP 4.10.32

      How reproducible:

      Steps to Reproduce:

      1.  
      2.  
      3. ...

      Actual results:

      Expected results:

      Additional info:

      ACM 2.7 large scale testing Run 13 we found cluster sno02400 showed a modified config file:

       

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno02400/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml                    
      apiVersion: v1
      data:
        config.yaml: |
          alertmanagerMain:
            nodeSelector: null
            resources: null
            tolerations: null
            volumeClaimTemplate: null
          enableUserWorkload: null
          grafana:
            nodeSelector: null
            tolerations: null
          http: null
          k8sPrometheusAdapter: null
          kubeStateMetrics: null
          openshiftStateMetrics: null
          prometheusK8s:
            additionalAlertManagerConfigs: null
            externalLabels: null
            logLevel: ""
            nodeSelector: null
            remoteWrite: null
            resources: null
            retention: 24h
            tolerations: null
            volumeClaimTemplate: null
          prometheusOperator: null
          telemeterClient: null
          thanosQuerier: null
      kind: ConfigMap
      metadata:
        creationTimestamp: "2023-01-18T09:01:55Z"
        name: cluster-monitoring-config
        namespace: openshift-monitoring
        resourceVersion: "65919"
        uid: c210447a-d5bf-40bb-b2af-a3f1f48ed548

      Compare to a non-modified config file:

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno00001/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml
      apiVersion: v1
      data:
        config.yaml: |
          grafana:
            enabled: false
          alertmanagerMain:
            enabled: false
          prometheusK8s:
             retention: 24h
      kind: ConfigMap
      metadata:
        creationTimestamp: "2023-01-18T04:59:53Z"
        name: cluster-monitoring-config
        namespace: openshift-monitoring
        resourceVersion: "61261"
        uid: ad28cf77-5a2e-48e9-a0e7-5c6330415e70

       

            smeduri1@redhat.com Subbarao Meduri
            akrzos@redhat.com Alex Krzos
            Xiang Yin Xiang Yin
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: