-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
subs-swatch-lightning
-
-
As part of SWATCH-3571, we verified that the accuracy for the following metrics are around 90%:
- Metered: swatch_metrics_ingested_usage_total
- Tallied: swatch_tally_tallied_usage_total
- Remitted Success: swatch_producer_metered_total
On the other hand, the following metrics can't be used for alerting because the usages are being "retried" if previous remittances failed to be submitted:
- Billing Pending - swatch_billable_usage_total
- Remitted Failures - swatch_producer_metered_total
I reported SWATCH-3633 to investigate if we can fix these two metrics.
However, we want to better understand why the accuracy for the first three metrics are 90% and check how better to write the alerts.
Acceptance Criteria
- Update the verification steps from https://docs.google.com/document/d/1liKSpUL1WIRO_MhUKA7OEmNCx4n8foBRmLQEvDpDz6I/edit?usp=sharing to group by date instead of month and check whether the accuracy of the data is better
- Analyse how to write the alert taking into account that we'll be using only the grafana metrics. For example: if the metered metric has value 100, and the tallied metric has the value 20. There is a 80 of difference which is more than the 90% of threshold, so we need to get an alert to further investigation.
- is related to
-
SWATCH-3633 Spike: Investigate how to fix the metrics swatch_billable_usage_total and swatch_producer_metered_total to not exclude already counted usage
-
- Backlog
-
-
SWATCH-3571 Spike: Investigate and Verify Prometheus Metric Accuracy for Metering
-
- Closed
-