Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.11.z
Component/s: TALM Operator
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None
Latest Status Summary:

Hide
3/3: telco reviewed for 4.13
12/7: ACM story is ready to test, see ACM-1933
12/1: green for 4.12, the ACM story for 2.6.z will be ready tomorrow (though ACM is asking for help to test it)
Tracker for ACM stories ACM-1753 (2.5.z) & ACM-1933 (2.6.z).

Show
3/3: telco reviewed for 4.13 12/7: ACM story is ready to test, see ACM-1933 12/1: green for 4.12, the ACM story for 2.6.z will be ready tomorrow (though ACM is asking for help to test it) Tracker for ACM stories ACM-1753 (2.5.z) & ACM-1933 (2.6.z).

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Proposed
Sprint:
None

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

While deploying many SNOs, more than half showed the common-config policy as NonCompliant because Observability also modifies the cluster-monitoring-config.  Depending on which modifies the configmap first results in if the policy will end up Compliant or NonCompliant.

Version-Release number of selected component (if applicable):

HUB OCP 4.11.2
SNO OCP 4.9.46
ACM 2.6 RC2 - 2.6.0-DOWNSTREAM-2022-08-26-01-33-09
Observability enabled

How reproducible:

Always

Steps to Reproduce:

1.
2.
3.

Actual results:

# oc get policy -n sno00007
NAME                                                REMEDIATION ACTION   COMPLIANCE STATE   AGE
ztp-common.common-config-policy                     inform               NonCompliant       46h
ztp-common.common-subscriptions-policy              inform               Compliant          46h
ztp-group-du-sno.du-upgrade-platform-upgrade        inform               Compliant          25h
ztp-group-du-sno.du-upgrade-platform-upgrade-prep   inform               Compliant          25h
ztp-group.group-du-sno-config-log-policy            inform               Compliant          46h
ztp-group.group-du-sno-config-policy                inform               Compliant          46h
ztp-group.group-du-sno-config-storage-policy        inform               Compliant          46h

Expected results:

All policies to be compliant

Additional info:

Originally it was thought that openshift-monitoring was modifying the configmap and the original bug was opened here - https://issues.redhat.com/browse/OCPBUGS-870

You can see that the endpoint-monitoring-operator modified the configmap last making the policy fall out of compliance:

# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00007/kubeconfig get cm -n openshift-monitoring cluster-monitoring-config -o yaml --show-managed-fields=true
apiVersion: v1
data:
  config.yaml: |
    alertmanagerMain:
      nodeSelector: null
      resources: null
      tolerations: null
      volumeClaimTemplate: null
    enableUserWorkload: null
    grafana:
      nodeSelector: null
      tolerations: null
    http: null
    k8sPrometheusAdapter: null
    kubeStateMetrics: null
    openshiftStateMetrics: null
    prometheusK8s:
      additionalAlertManagerConfigs:
      - apiVersion: v2
        bearerToken:
          key: token
          name: observability-alertmanager-accessor
        pathPrefix: /
        scheme: https
        staticConfigs:
        - alertmanager-open-cluster-management-observability.apps.bm-stage.rdu2.scalelab.redhat.com
        tlsConfig:
          ServerName: ""
          ca:
            key: service-ca.crt
            name: hub-alertmanager-router-ca
          insecureSkipVerify: false
      externalLabels:
        cluster: 3f2759fc-42e3-4851-8099-5f5ad646f171
      logLevel: ""
      nodeSelector: null
      remoteWrite: null
      resources: null
      retention: 24h
      tolerations: null
      volumeClaimTemplate: null
    prometheusOperator: null
    telemeterClient: null
    thanosQuerier: null
kind: ConfigMap
metadata:
  creationTimestamp: "2022-09-06T16:12:13Z"
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data: {}
    manager: config-policy-controller
    operation: Update
    time: "2022-09-06T16:12:13Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:data:
        f:config.yaml: {}
    manager: endpoint-monitoring-operator
    operation: Update
    time: "2022-09-06T16:14:53Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "21926"
  uid: d2cad672-4f6e-40d2-8d1b-f760668e48bf

is duplicated by

OCPBUGS-870 cluster-monitoring-config was replaced during ZTP policy rollout

Closed

is related to

ACM-2949 Observability writing null config values to DU profile maintained configuration causes policies to show non-compliant

Closed

ACM-4538 (ACM 2.6) Configuration for Observability needed

Closed

ACM-5623 (ACM 2.7) Configuration for Observability needed

Closed

ACM-5624 (ACM 2.8) Configuration for Observability needed

Closed

is triggering

ACM-1845 RFE Provide the Policy merge capability within stream data contained in a ConfigMap

Closed

(1 is triggering)

Assignee:: Subbarao Meduri

Reporter:: Alex Krzos

Need Info From:: None

Contributors:: None

QA Contact:: Alex Krzos

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 16 Start watching this issue

Created:: 2022/09/08 2:03 PM

Updated:: 2025/07/29 11:32 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide