Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.21
Component/s: GitOps ZTP
Labels:
- telco-ran

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None
Architecture:

All

Target Backport Versions:

4.20.z
Target Version:

4.21.0
Release Blocker:
Rejected
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The existing RAN ReduceMonitoringFootprint source CR causes the informDuValidatorValidator.yaml validator to fail in MNO deployments.

Version-Release number of selected component (if applicable):

OCP 4.21/ACM 2.15+

How reproducible:

Always

Steps to Reproduce:

    1. Deploy hub cluster without Observability
    2. Deploy an MNO spoke cluster using GitOps ZTP using the RAN RDS source CRs, which include ReduceMonitoringFootprint CR
    2. Observe deployment progress.
    3.

Actual results:

Spoke deployment fails due to Policies being NonCompliant. 

[kni@registry.kni-qe-97 ~]$ oc get policy -A
NAMESPACE   NAME                                                    REMEDIATION ACTION   COMPLIANCE STATE   AGE
kni-qe-96   ztp-common.common-v4.21-config-policy                   enforce              Compliant          3h40m
kni-qe-96   ztp-common.common-v4.21-subscriptions-policy            enforce              Compliant          3h40m
kni-qe-96   ztp-group.group-du-standard-v4.21-config-policy         enforce              Compliant          3h40m
kni-qe-96   ztp-group.group-du-standard-validator-v4.21-du-policy   enforce              NonCompliant       3h40m
kni-qe-96   ztp-site.kni-qe-96-v4.21-config-policy                  enforce              Compliant          3h40m

[kni@registry.kni-qe-97 ~]$ oc get mcp -A 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-b5a73d79f66578acee5849a74aa265b6   True      False      False      3              3                   3                     0                      4h12m
worker   rendered-worker-ea79df029175635fb9e102030187d7f6   False     True       True       2              1                   1                     1                      4h12m

[kni@registry.kni-qe-97 ~]$ oc logs -n openshift-machine-config-operator -l k8s-app=machine-config-controller
Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
I0210 14:45:42.694076       1 drain_controller.go:162] evicting pod openshift-monitoring/prometheus-k8s-0
E0210 14:45:42.704400       1 drain_controller.go:162] error when evicting pods/"prometheus-k8s-0" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
I0210 14:45:47.704695       1 drain_controller.go:162] evicting pod openshift-monitoring/prometheus-k8s-0
I0210 14:45:47.704773       1 drain_controller.go:192] node worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com: Drain failed. Drain has been failing for more than 10 minutes. Waiting 5 minutes then retrying. Error message from drain: error when evicting pods/"prometheus-k8s-0" -n "openshift-monitoring": global timeout reached: 1m30s
E0210 14:45:47.705385       1 event.go:442] "Could not construct reference, will not report event" err="no kind is registered for the type v1.Node in scheme \"github.com/openshift/client-go/machineconfiguration/clientset/versioned/scheme/register.go:15\"" object="&Node{ObjectMeta:{worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com    063c387a-2b5c-45c5-a8b9-c6aef36b3ab8 90741 0 2026-02-10 10:52:58 +0000 UTC <nil> <nil> map[beta.kubernetes.io/arch:arm64 beta.kubernetes.io/os:linux kubernetes.io/arch:arm64 kubernetes.io/hostname:worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com kubernetes.io/os:linux node-role.kubernetes.io/worker: node.openshift.io/os_id:rhel sriovnetwork.openshift.io/device-plugin:Enabled] map[k8s.ovn.org/host-cidrs:[\"10.6.159.15/24\",\"2620:52:9:169f::1001/64\"] k8s.ovn.org/l3-gateway-config:{\"default\":{\"mode\":\"shared\",\"bridge-id\":\"br-ex\",\"interface-id\":\"br-ex_worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com\",\"mac-address\":\"b8:e9:24:80:74:0e\",\"ip-addresses\":[\"10.6.159.15/24\",\"2620:52:9:169f::1001/64\"],\"next-hops\":[\"10.6.159.254\",\"2620:52:9:169f::1\"],\"node-port-enable\":\"true\",\"vlan-id\":\"0\"}} k8s.ovn.org/layer2-topology-version:2.0 k8s.ovn.org/node-chassis-id:58f4f60e-2fb1-4794-9ff6-133f614de79c k8s.ovn.org/node-encap-ips:[\"10.6.159.15\"] k8s.ovn.org     

During the node updates, node fails to drain because prometheus-k8s pods cannot be evicted, causing the mcp to never be updated, causing the ztp-group.group-du-standard-validator-v4.21-du-policy to never be compliant

Expected results:

MNO Spoke deployment should succeed with the ReduceMonitoringFootprint CR applied when ACM observability is not enabled in the hub

Additional info:

With this patch, MNO deployment works as expected.  

- path: source-crs/cluster-tuning/monitoring-configuration/ReduceMonitoringFootprint.yaml
        patches:
          - data:
              config.yaml: |
                alertmanagerMain:
                  enabled: false
                telemeterClient:
                  enabled: false
                prometheusK8s:
                  retention: 24h

Applying the CR without the patch works OK in SNO deployments.

clones

OCPBUGS-63008 Setting mco-disable-alerting to true makes ACM to remove a URL from hub-info secret

Closed

Assignee:: Sabbir Hasan

Reporter:: Alaitz Mendiola

Need Info From:: None

Contributors:: Abraham Miller, Alaitz Mendiola, Alexander Gurenko, Ian Miller, Jim Ramsay

QA Contact:: Alaitz Mendiola

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/10 2:43 PM

Updated:: 2026/02/10 3:05 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates