-
Bug
-
Resolution: Done
-
Major
-
ACM 2.15.1
-
Quality / Stability / Reliability
-
False
-
-
False
-
-
-
Moderate
-
None
Description of problem:
The endpoint-operator is unable to configure alertforwarding from Prometheus on non-ocp spokes. It means the addon never progresses and the endpoint-monitoring-operator crashes with:
2026-01-16T09:39:08.677Z ERROR Reconciler error {"controller": "observabilityaddon-controller", "controllerGroup": "observability.open-cluster-management.io", "controllerKind": "ObservabilityAddon", "ObservabilityAddon": {"name":"observability-alertmanager-accessor","namespace":"open-cluster-management-addon-observability"}, "namespace": "open-cluster-management-addon-observability", "name": "observability-alertmanager-accessor", "reconcileID": "1f30170d-209c-4b66-92ba-2ada2f4b6432", "error": "failed to create or update cluster monitoring config: failed to create or update the alertmanager accessor token secret: fail to get open-cluster-management-addon-observability/observability-alertmanager-accessor secret: secrets \"observability-alertmanager-accessor\" not found"}
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
/cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:316
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
/cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
/cachi2/output/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.19.7/pkg/internal/controller/controller.go:224
The issue appear to be the following:
1. When we try to create the configmap, we first set the namespace to `open-cluster-management-addon-observability` which is different from OCP clusters (here we set `openshift-monitoring`): https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L390
2. We then create the secret, with the cluster-id postfix, I think that will work the first time around? https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L399
3. We then delete the old secrets, i.e the ones without post-fixes. On OCP spokes, the namespace will be `openshift-monitoring` but on non-ocp, as mentioned in 1. we target the `open-cluster-management-addon-observability` namespace.
4. On the next reconcile, we again hit `createHubAmAccessorTokenSecret`. In this function, we expect the non post-fixed secret to exist, but it was deleted, by the delete functionality in step 3. It then breaks here when getting the secret, as we expect it to be there without the postfix : https://github.com/stolostron/multicluster-observability-operator/blob/dc0cfb9d43c224721c61ebaeb23a563c5620a134/operators/endpointmetrics/controllers/observabilityendpoint/ocp_monitoring_config.go#L191
Version-Release number of selected component (if applicable):
How reproducible:
- ?
Steps to Reproduce:
- ...