Resolution: Unresolved
ACM MCO must-gather metrics and dashboards
Not Selected
To Do
We need to ship component dashboards for Thanos components w/ ACM so that customers can debug issues and provide extracts of it without knowing the direct metrics ahead of time.
We need to figure out if this is better to do via:
- OCP Dashboards, adding dashboards to the OCP Observe tab for ACM Thanos components (https://github.com/grafana/loki/pull/9468)
- ACM Managed Grafana, adding it as a datasource to the MCO deployed Grafana (https://github.com/stolostron/multicluster-observability-operator/blob/main/loaders/dashboards/examples/thanos/README.md)
- MUST gather platform metrics related to ACM MCO components in a standard format (either directly via CMO in Hub or via remote-write into MCO Thanos)
- MUST ship dashboards by default in ACM Observability stack (either via MCO Grafana or OCP Console)
- MUST included instruction for customers on how to supply metrics that would be useful for ACM MCO Support
- SHOULD understand the impact of platform metrics on idle cost of MCO in Hub
- COULD use Grafana dashboard snapshotting or Promtheus TSDB snapshot to automate gathering of metrics over a set time and allow engineers to explore data retroactively