-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
The current OSSM 3.x observability documentation provides a baseline PodMonitor example for scraping istio-proxy metrics via User Workload Monitoring, but includes no guidance on metric thinning or metricRelabelings to control time-series cardinality. In multi-tenant clusters with many namespaces, deploying this PodMonitor as-is results in excessive cardinality that can cause Prometheus to OOM.
Customers are following our docs, hitting Prometheus instability, and being referred to upstream Kiali community documentation for the solution. This has resulted in customer escalations (OSSM-12491).
- OSSM 3.x observability docs (link) provide a PodMonitor/ServiceMonitor example with no cardinality tuning
- The only metric thinning guidance exists in Kiali community docs (link)
- The list of required Kiali metrics/attributes is only documented in the Kiali FAQ (link)
- The metricRelabelings field is supported on PodMonitor/ServiceMonitor CRDs but our OSSM docs don't reference it
Our ideal outcome:
- definitively tells users how they should be configuring OSSM to populate UWM with the envoy and istio metrics.
- Provides guidelines on how to do that without clobbering UWM (scrape intervals, retension settings, possibly sizing guidance)
- Provides a mechanism to scale down those Envoy and Istio metrics to only what Kiali is using, if that is their only observability tooling for mesh.