-
Task
-
Resolution: Won't Do
-
Normal
-
None
-
None
-
False
-
None
-
False
-
No
-
MGDSRVS-170 - Improve Monitoring, Metrics and Observability capabilities to support Production Workloads
-
---
-
---
-
-
WHAT
The new metrics introduced by the work of this MGDSTRM-10148 are required to have the instance_name and instance_id labels. This change will help bring them into being.
WHY
Supports the use-case being delivered by this Epic. This also takes a step towards MGDSTRM-7080.
HOW
The metrics produced by fleetshard already have the instance_name label. We need to add the instance_id.
- Change fleetshard MetricManager to populate a instance_id label from org.bf2.operator.resources.v1alpha1.ManagedKafka#ID managed kafka resource label. This will add the label onto all instance specific metrics emitted by fleeshard, including kafka_instance_connection_limit desired by this change.
In RHOSAK the kafka metrics already have an instance_name label. This is produced by this relabelling rule which takes the strimzi_io_cluster label produced by Strimzi and maps it into instance_name. This is no label containing the instance's id We make a change to cause the scrape to emit a bf2_org_id label containing the instance's id.
- Change fleetshard org.bf2.operator.operands.KafkaCluster#buildKafkaLabels to include the instance's id as a label bf2.org/id on the Kakfa CR.
Note that:
- Strimzi will propagate the label to the pods.
- Prometheus automatic creates metric labels from the pod's labels during ingestion. This will mean that metrics such as kafka_server_socket_server_metrics_connection_count will automatically have a bf2_org_id label.
- MGDSTRM-10287 will use a relabelled rule to map bf2_org_id to instance_id.
We should verify that the addition of these labels won't upset existing alerts or dashboards.
DONE
- Changes made to fleetshard with appropriate unit tests.