Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76491

Adding the ReduceMonitoringFootprint RAN RDS CR into an MNO causes validation to fail

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.21
    • GitOps ZTP
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • All
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The existing RAN ReduceMonitoringFootprint source CR causes the informDuValidatorValidator.yaml validator to fail in MNO deployments.

      Version-Release number of selected component (if applicable):

      OCP 4.21/ACM 2.15+
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Deploy hub cluster without Observability
          2. Deploy an MNO spoke cluster using GitOps ZTP using the RAN RDS source CRs, which include ReduceMonitoringFootprint CR
          2. Observe deployment progress.
          3.
          

      Actual results:

      Spoke deployment fails due to Policies being NonCompliant. 
      
      [kni@registry.kni-qe-97 ~]$ oc get policy -A
      NAMESPACE   NAME                                                    REMEDIATION ACTION   COMPLIANCE STATE   AGE
      kni-qe-96   ztp-common.common-v4.21-config-policy                   enforce              Compliant          3h40m
      kni-qe-96   ztp-common.common-v4.21-subscriptions-policy            enforce              Compliant          3h40m
      kni-qe-96   ztp-group.group-du-standard-v4.21-config-policy         enforce              Compliant          3h40m
      kni-qe-96   ztp-group.group-du-standard-validator-v4.21-du-policy   enforce              NonCompliant       3h40m
      kni-qe-96   ztp-site.kni-qe-96-v4.21-config-policy                  enforce              Compliant          3h40m
      
      [kni@registry.kni-qe-97 ~]$ oc get mcp -A 
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-b5a73d79f66578acee5849a74aa265b6   True      False      False      3              3                   3                     0                      4h12m
      worker   rendered-worker-ea79df029175635fb9e102030187d7f6   False     True       True       2              1                   1                     1                      4h12m
      
      [kni@registry.kni-qe-97 ~]$ oc logs -n openshift-machine-config-operator -l k8s-app=machine-config-controller
      Defaulted container "machine-config-controller" out of: machine-config-controller, kube-rbac-proxy
      I0210 14:45:42.694076       1 drain_controller.go:162] evicting pod openshift-monitoring/prometheus-k8s-0
      E0210 14:45:42.704400       1 drain_controller.go:162] error when evicting pods/"prometheus-k8s-0" -n "openshift-monitoring" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.
      I0210 14:45:47.704695       1 drain_controller.go:162] evicting pod openshift-monitoring/prometheus-k8s-0
      I0210 14:45:47.704773       1 drain_controller.go:192] node worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com: Drain failed. Drain has been failing for more than 10 minutes. Waiting 5 minutes then retrying. Error message from drain: error when evicting pods/"prometheus-k8s-0" -n "openshift-monitoring": global timeout reached: 1m30s
      E0210 14:45:47.705385       1 event.go:442] "Could not construct reference, will not report event" err="no kind is registered for the type v1.Node in scheme \"github.com/openshift/client-go/machineconfiguration/clientset/versioned/scheme/register.go:15\"" object="&Node{ObjectMeta:{worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com    063c387a-2b5c-45c5-a8b9-c6aef36b3ab8 90741 0 2026-02-10 10:52:58 +0000 UTC <nil> <nil> map[beta.kubernetes.io/arch:arm64 beta.kubernetes.io/os:linux kubernetes.io/arch:arm64 kubernetes.io/hostname:worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com kubernetes.io/os:linux node-role.kubernetes.io/worker: node.openshift.io/os_id:rhel sriovnetwork.openshift.io/device-plugin:Enabled] map[k8s.ovn.org/host-cidrs:[\"10.6.159.15/24\",\"2620:52:9:169f::1001/64\"] k8s.ovn.org/l3-gateway-config:{\"default\":{\"mode\":\"shared\",\"bridge-id\":\"br-ex\",\"interface-id\":\"br-ex_worker-1.kni-qe-96.telcoqe.eng.rdu2.dc.redhat.com\",\"mac-address\":\"b8:e9:24:80:74:0e\",\"ip-addresses\":[\"10.6.159.15/24\",\"2620:52:9:169f::1001/64\"],\"next-hops\":[\"10.6.159.254\",\"2620:52:9:169f::1\"],\"node-port-enable\":\"true\",\"vlan-id\":\"0\"}} k8s.ovn.org/layer2-topology-version:2.0 k8s.ovn.org/node-chassis-id:58f4f60e-2fb1-4794-9ff6-133f614de79c k8s.ovn.org/node-encap-ips:[\"10.6.159.15\"] k8s.ovn.org     
      
      During the node updates, node fails to drain because prometheus-k8s pods cannot be evicted, causing the mcp to never be updated, causing the ztp-group.group-du-standard-validator-v4.21-du-policy to never be compliant

      Expected results:

      MNO Spoke deployment should succeed with the ReduceMonitoringFootprint CR applied when ACM observability is not enabled in the hub

      Additional info:

      With this patch, MNO deployment works as expected.  
      
      - path: source-crs/cluster-tuning/monitoring-configuration/ReduceMonitoringFootprint.yaml
              patches:
                - data:
                    config.yaml: |
                      alertmanagerMain:
                        enabled: false
                      telemeterClient:
                        enabled: false
                      prometheusK8s:
                        retention: 24h
      
      Applying the CR without the patch works OK in SNO deployments.   

              sahasan@redhat.com Sabbir Hasan
              amendiol@redhat.com Alaitz Mendiola
              None
              Abraham Miller, Alaitz Mendiola, Alexander Gurenko, Ian Miller, Jim Ramsay
              Alaitz Mendiola Alaitz Mendiola
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: