-
Story
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
5
-
False
-
-
False
-
subs-swatch-lightning
-
-
We will set up alerts based on differences between the various states (with a low threshold - starting with 0%):
We can use "on(product, metric_id, billing_provider)" to join the meter counters.
- Metered usage exceeds tallied usage
- swatch_metrics_ingested_usage_total / swatch_tally_tallied_usage_total > 1.0
- Message: "Metered usage of {swatch_metrics_ingested_usage_total} exceeds tallied usage of {swatch_tally_tallied_usage_total} for product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
- Tally usage exceeds metered usage
- swatch_tally_tallied_usage_total / swatch_metrics_ingested_usage_total > 1.0
- Message: "Tallied Usage of {swatch_tally_tallied_usage_total} exceeds metered usage of {swatch_metrics_ingested_usage_total} or product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
- Tallied usage exceeds billable/covered usage by greater than 1%; 1% allowed due to integer rounding.
- swatch_tally_tallied_usage_total / (swatch_contract_usage_total + swatch_billable_usage_total{status="pending"}) > 1.01
- Message: "Tallied usage {swatch_tally_tallied_usage_total} billable and contract covered usage of {swatch_contract_usage_total + swatch_billable_usage_total{status="pending"}} by greater than 1% or product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
- Billable and contract covered usage exceeds tallied usage by greater than 1%; 1% allowed due to integer rounding
- (swatch_contract_usage_total + swatch_billable_usage_total{status="pending"}) / swatch_tally_tallied_usage_total > 1.01
- Message: "Billable and contract covered usage of {swatch_contract_usage_total + swatch_billable_usage_total{status="pending"}} exceeds tallied usage of {swatch_tally_tallied_usage_total} by greater than 1% for product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
- Billable usage exceeds remitted usage
- swatch_billable_usage_total{status="pending"} offset 1h / swatch_producer_metered_total > 1.0
- Message: "Billable usage of {swatch_billable_usage_total
{status="pending"} offset 1h} exceeds remitted usage of {swatch_producer_metered_total} for product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
* Remitted usage exceeds billable usage
** swatch_producer_metered_total / swatch_billable_usage_total{status="pending"} offset 1h / > 1.0
** Message: "Pending billable usage of {swatch_billable_usage_total{status="pending"}
offset 1h} exceeds remitted usage of {swatch_producer_metered_total} for product: {product_tag}, metric_id: {metric_id}, and billing_provider: {billing_provider}"
This alerting will be available in production as well as the canary test environment.
When these stay in a state for more than 10 minutes, we'll trigger an alert. Note we may need to adjust the percentages over time in order to reduce false positives.
Note: How to deal with products that are not billable?
Done
- Separate alerts created for the scenarios listed
- If any of the above are in the state for more than 10 minutes that the alert is fired
- The alert will
- Send a message to swatch-alerts slack
- Fire off in pager duty or whatever mechanism to text Barnaby
- Promql tests created for each alert
- SOP created for each alert
- is blocked by
-
SWATCH-3573 Spike: Test and Define PromQL Alert Queries for Metering Pipeline
-
- Backlog
-
-
SWATCH-2297 Add a metric for ingested usage data from Prometheus
-
- Closed
-
-
SWATCH-2299 Add a metric for usage tallied
-
- Closed
-
-
SWATCH-2300 Add a metric for usage covered by a contract
-
- Closed
-
-
SWATCH-2301 Add a metric for usage considered billable
-
- Closed
-
-
SWATCH-2302 Add a metric to swatch-producer-aws for metered usage
-
- Closed
-
-
SWATCH-2303 Add a metric to swatch-producer-azure for metered usage
-
- Closed
-
-
SWATCH-2304 Add a metric for usage aggregated per hour
-
- Closed
-
-
SWATCH-3571 Spike: Investigate and Verify Prometheus Metric Accuracy for Metering
-
- Closed
-
- is cloned by
-
SWATCH-3448 Create alerts for PAYG if remittance stops
-
- Closed
-