Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-28638

non-OCP Spoke alertforwarding non-functioning

XMLWordPrintable

    • Moderate
    • None

      Description of problem:

      The endpoint-operator is unable to configure alertforwarding from Prometheus on non-ocp spokes. It means the addon never progresses and the endpoint-monitoring-operator crashes with:

      2026-01-16T09:39:08.677Z ERROR Reconciler error {"controller": "observabilityaddon-controller", "controllerGroup": "observability.open-cluster-management.io", "controllerKind": "ObservabilityAddon", "ObservabilityAddon": {"name":"observability-alertmanager-accessor","namespace":"open-cluster-management-addon-observability"}, "namespace": "open-cluster-management-addon-observability", "name": "observability-alertmanager-accessor", "reconcileID": "1f30170d-209c-4b66-92ba-2ada2f4b6432", "error": "failed to create or update cluster monitoring config: failed to create or update the alertmanager accessor token secret: fail to get open-cluster-management-addon-observability/observability-alertmanager-accessor secret: secrets \"observability-alertmanager-accessor\" not found"}
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
      /cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:316
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
      /cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:263
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
      /cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:224
      

      The issue appear to be the following:

      1. When we try to create the configmap, we first set the namespace to `open-cluster-management-addon-observability` which is different from OCP clusters (here we set `openshift-monitoring`): https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L390

      2. We then create the secret, with the cluster-id postfix, I think that will work the first time around? https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L399

      3. We then delete the old secrets, i.e the ones without post-fixes. On OCP spokes, the namespace will be `openshift-monitoring` but on non-ocp, as mentioned in 1. we target the `open-cluster-management-addon-observability` namespace.

      4. On the next reconcile, we again hit `createHubAmAccessorTokenSecret`. In this function, we expect the non post-fixed secret to exist, but it was deleted, by the delete functionality in step 3. It then breaks here when getting the secret, as we expect it to be there without the postfix : https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L191

      Version-Release number of selected component (if applicable):

      How reproducible:

      • ?

      Steps to Reproduce:

      1. ...

      Actual results:

      • addon broken, alert forwarding not configured, endpoint operator crashing

        Expected results:

      • alert forwarding works

        Additional info:

              rh-ee-coquadro Coleen Iona Quadros
              rh-ee-jachanse Jacob Baungard Hansen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: