Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: ACM 2.8.0
Affects Version/s: None
Component/s: Observability
Labels:

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Git Pull Request:
https://github.com/stolostron/multicluster-observability-operator/pull/1199
Intelligence Requested:
Market:

Sprint:
Observability Sprint 2023-08
Severity:
Important

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

While deploying 3000+ SNOs with ACM and ZTP, we have found occasional clusters showing the common-config-policy as noncompliant with the violation being that the cluster-monitoring-config was modified and it seems that OBS has done so via re-rendering the config file through some sort of yaml serializer which outputs unexpected null values. Since the cluster-monitoring-config is simply a yaml file inserted into a configmap as a single string, when ACM policy compares a modified string vs the expected string, it is found noncompliant. Furthermore it appears that some configuration may have been dropped. The fix here should be for OBS to not even re-render the config file and to leave it alone with the annotation exists to prevent obs from rolling out alerting configuration.

Version-Release number of selected component (if applicable):

2.7.0-DOWNSTREAM-2023-01-16-18-27-49
Hub OCP 4.11.19
SNO OCP 4.10.32

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

ACM 2.7 large scale testing Run 13 we found cluster sno02400 showed a modified config file:

# oc --kubeconfig=/root/hv-vm/sno/manifests/sno02400/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml                    
apiVersion: v1
data:
  config.yaml: |
    alertmanagerMain:
      nodeSelector: null
      resources: null
      tolerations: null
      volumeClaimTemplate: null
    enableUserWorkload: null
    grafana:
      nodeSelector: null
      tolerations: null
    http: null
    k8sPrometheusAdapter: null
    kubeStateMetrics: null
    openshiftStateMetrics: null
    prometheusK8s:
      additionalAlertManagerConfigs: null
      externalLabels: null
      logLevel: ""
      nodeSelector: null
      remoteWrite: null
      resources: null
      retention: 24h
      tolerations: null
      volumeClaimTemplate: null
    prometheusOperator: null
    telemeterClient: null
    thanosQuerier: null
kind: ConfigMap
metadata:
  creationTimestamp: "2023-01-18T09:01:55Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "65919"
  uid: c210447a-d5bf-40bb-b2af-a3f1f48ed548

Compare to a non-modified config file:

# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00001/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml
apiVersion: v1
data:
  config.yaml: |
    grafana:
      enabled: false
    alertmanagerMain:
      enabled: false
    prometheusK8s:
       retention: 24h
kind: ConfigMap
metadata:
  creationTimestamp: "2023-01-18T04:59:53Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "61261"
  uid: ad28cf77-5a2e-48e9-a0e7-5c6330415e70

is related to

ACM-4538 (ACM 2.6) Configuration for Observability needed

Closed

ACM-5623 (ACM 2.7) Configuration for Observability needed

Closed

ACM-5624 (ACM 2.8) Configuration for Observability needed

Closed

relates to

OCPBUGS-1025 [tracker]cluster-monitoring-config race condition between Observability and du profile

ON_QA

Assignee:: Subbarao Meduri

Reporter:: Alex Krzos

QA Contact:: Xiang Yin

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/01/19 8:18 PM

Updated:: 2023/06/06 7:04 PM

Resolved:: 2023/06/06 7:04 PM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide