Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.21
Component/s: GitOps ZTP
Labels:
- telco

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None

Target Backport Versions:
None
Target Version:

4.22.0
Release Blocker:
None
Sprint:
CNF RAN Sprint 282, CNF RAN Sprint 283, CNF RAN Sprint 284, CNF RAN Sprint 285
sprint_count:
4

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


with telco-ran CR ReduceMonitoringFootprint.yaml
applied to a spoke running OCP 4.20 with ZTP/TALM 4.21 (IBU target host), the rendered ConfigMap
looks garbled:

apiVersion: v1
data:
  config.yaml: "alertmanagerMain:\n  enabled: false\ntelemeterClient:\n  enabled:
    false\nnodeExporter:\n  collectors:\n    buddyinfo: {}\n    cpufreq: {}\n    ksmd:
    {}\n    mountstats: {}\n    netclass: {}\n    netdev: {}\n    processes: {}\n
    \   systemd: {}\n    tcpstat: {}\nprometheusK8s:\n  additionalAlertmanagerConfigs:\n
    \ - apiVersion: v2\n    bearerToken:\n      key: token\n      name: observability-alertmanager-accessor\n
    \   scheme: https\n    staticConfigs:\n    - \n    tlsConfig:\n      ca:\n        key:
    service-ca.crt\n        name: hub-alertmanager-router-ca\n      insecureSkipVerify:
    false\n  externalLabels:\n    managed_cluster: b9ca26b2-55ed-4ec8-826b-3917eb8e24c7\n
    \ retention: 24h\n"
kind: ConfigMap
metadata:
  creationTimestamp: "2025-11-24T23:01:14Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "12472"
  uid: 0e5629d5-8da9-4c48-b001-75c3ad713bb5

######

When this CR is applied on a spoke via ZTP/TALM 4.20, the ConfigMap appears to render correctly,
and the prometheus-k8s-0 pod is stable.
apiVersion: v1
data:
  config.yaml: |
    alertmanagerMain:
      enabled: false
    telemeterClient:
      enabled: false
    prometheusK8s:
      retention: 24h
kind: ConfigMap
metadata:
  annotations:
    ran.openshift.io/ztp-deploy-wave: "1"
  creationTimestamp: "2025-11-24T23:19:41Z"
  name: cluster-monitoring-config
  namespace: openshift-monitoring
  resourceVersion: "21867"
  uid: 2c762218-020b-40a1-8916-1225287b96e1

with ZTP 4.21, if the CR is completely overriden, i.e. like this (PGT):

    - fileName: ReduceMonitoringFootprint.yaml
      data:
        config.yaml: |
          alertmanagerMain:
            enabled: false
          telemeterClient:
            enabled: false
          prometheusK8s:
            retention: 24h

And the prometheus-k8s-0 pod is deleted after the openshift-monitoring cm 
is updated to match 4.20 as above by the policy change,
the pod starts and is stable.

Version-Release number of selected component (if applicable):


In both cases spoke is deployed with OCP 4.20

hubs for working and non-working spoke env have these in common
$ oc get clusterversions.config.openshift.io 
NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.21.0-ec.3   True        False         3d6h    Cluster version is 4.21.0-ec.3
[kni@kni-qe-82 dgonyier]$ source ./csv-list 
advanced-cluster-management.v2.15.0
multicluster-engine.v2.10.0
openshift-gitops-operator.v1.18.1
packageserver

Differences on hubs
working:
ZTP 4.20
TALM 4.20.1

non-working:
ZTP 4.21
TALM 4.21.0

How reproducible:

Always

Steps to Reproduce:

    1. Deploy spoke with RAN DU profile, with hub csv versions for failing case as shown above
    2. Review monitoring policy and live CR
    3.

Actual results:

CR looks mangled

Expected results:

CR is sane

Additional info:


Currently the source CR is identical between main and release-4.20 branches.

I am not sure if this is due to ZTP 4.21 or TALM 4.21

Workarounds:
don't use the ReduceMonitoringFootprint.yaml CR
OR
override the `data:` field in the policy template with the working config as described above.

links to

openshift-kni/telco-reference#487: OCPBUGS-65953: Safety checks for ReducedMonitoringFootprint when observability is disabled

Assignee:: Abraham Miller

Reporter:: Dwaine Gonyier

QA Contact:: Dwaine Gonyier

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2025/11/25 1:59 AM

Updated:: 2026/03/02 1:02 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates