-
Bug
-
Resolution: Done
-
Critical
-
ACM 2.8.0, ACM 2.7.0
-
1
-
False
-
None
-
False
-
-
-
Observability Sprint 2023-11, Observability Sprint 2023-15
-
Important
-
No
Description of problem:
ACM is Monitoring Hypershift Onprem installation:
The ACM is installed on the cluster which contains the hypershift operator. That is ACM is running on the hosting cluster.
The ACM Clusters Overview dashboard does not show control plane data (etcd, api server) for the hosted clusters. (see the )
The ACM Clusters Overview dashboard does show the regular data for the hosted clusters.
Version-Release number of selected component (if applicable):
How reproducible: Always
Steps to Reproduce:
Actual results: Dashboard missing data
Expected results: Dashboard should contain the data
Additional info:
This is reproduced in a lab environment. Credentials of that system can be obtained privately from sberens@redhat.com
Debugging details -
- The ACM ServiceMonitor created in the hosted-control-plane namespace does not contain the right TLS credentials for UWL prometheus to pick up the data.
- Hypershift creates 2 service monitors in the same namespace called - etcd and kube-apiserver
- ACM mimics them and creates acm-etcd and acm-kube-apiserver. The ACM ServiceMonitors were pointing to secrets (created by hypershift) for TLS creds. But from hypershift created ServiceMonitors, it looks like we should be looking at configmap (created by hypershift) instead.
- Once we fix the ServiceMonitor manually -
- we do see that these `targets` in Prometheus UI (OCP Console -> Observe -> Target) are active.
- And we do see the UWL Metric collector of ACM send `additional` time series data (metric collector logs)
- However the UWL metric collector has incorrect cluster and and cluster_id in its arguments. These do not point to the hosted-control-plane. They point to the local-cluster instead.
- Once this was fixed manually -
- We did expect the data to appear in the dashboard. But it did not appear due to another issue with Thanos (which we are dealing separately because it is unrelated to the Hypershift issue)