Most of our Grafana Dashboards have a chart for CPU Usage. But it looks like for some reason, each of the dashboards is using a different metrics for it. For example:
- Kafka dashboard is using container_cpu_usage_seconds_total (https://github.com/strimzi/strimzi-kafka-operator/blob/d407e36726b9c68bdddbc5fc7f1ae342ac787a4e/examples/metrics/grafana-dashboards/strimzi-kafka.json#L855). The same metric seems to be used also for Zoo for example:
sum(rate(container_cpu_usage_seconds_total{namespace=\"$kubernetes_namespace\",pod=~\"$strimzi_cluster_name-$kafka_broker\",container=\"kafka\"}[5m])) by (pod)
- Connect is using process_cpu_seconds_total (https://github.com/strimzi/strimzi-kafka-operator/blob/d407e36726b9c68bdddbc5fc7f1ae342ac787a4e/examples/metrics/grafana-dashboards/strimzi-kafka-connect.json#L433):
rate(process_cpu_seconds_total{strimzi_io_kind=~\"KafkaConnect.*\",strimzi_io_cluster=\"$strimzi_connect_cluster_name\"}[1m])
I think we should ideally use the same metric in all dashboards to make sure it gives a consistent view.
This has been raised by Strimzi#4135.