We should provide and document a default set of metrics, even if not consumed in the dashboards, to cover potential user needs with low effort. It's also going to be needed for dashboards consumption (
This epic covers:
- From the operator, configuring FLP to provide a set of metrics (we can start from there and refine: https://github.com/netobserv/flowlogs-pipeline/blob/main/contrib/kubernetes/flowlogs-pipeline.conf.yaml#L171 ; we need to take care about cardinality, ie. not index all ips/pods for instance
- Document this set of metrics
- From the operator, create the needed ServiceMonitor resources (including for the console plugin, which also expose internal metrics)
- Provide some guidance to deploy a Prometheus instance (similar to our loki-zero-click install), or to configure cluster prometheus to collect third-party metrics.
- Provide an automated installation of the prometheus operator when needed/desired, as in the Dependent Operators PoC
- Note that in the downstream version, we should have metrics scraped by cluster monitoring prom (it's allowed for registered redhat operators), so no need to install a prometheus operator in that case - it's only for upstream. See also: https://docs.google.com/document/d/1Mru7pqkpx2gmxMxK6AYOLVphWW0mzwZG-R8_HBVxLwI
We may have three levels of config regarding metrics collection:
- Turn on metrics collection (internal metrics + flow metrics)
- Turn on minimal set of metrics (only the flow metrics that are/will be used in dashboards)
- Turn off all metrics
Default should be 1.