Loading...

XML

Word

Printable

Type: Task
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: swatch-billable-usage, swatch-producer-aws, swatch-producer-azure
Labels:
- refineable

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
payg-monitoring-alerting-improvements
AssignedTeam:
subs-swatch-lightning
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

To understand better the issue, this is the scenario:
1. Receive an event when a metric value of "100"
2. Process an aggregate usage including the above metric value of "100"
3. Then, the producer fails to submit this usage, so the record stays with status "failed"
4. We receive another event with a metric value of "20"
5. Because, there is an existing "failed" usage, we add the previous value with the new one, so the total usage is now "120"
6. Then, the producer fails again to submit this usage, so the record stays with status "failed"

What the problem is with the metrics?

swatch_billable_usage_total

The metric "swatch_billable_usage_total" will count both the values "100" and "120", when it should count only "100" and "20".

swatch_producer_metered_total

This only happens when the producer fails to submit the usage. The metric will count both the values "100" and "120", when it should count "100" and "20".

Note that when processing the metric "swatch_billable_usage_total", we know what the current value is "20", so we could easily fix this metric. However, we don't have the current value "20" in "swatch_producer_metered_total", so I don't think we can't fix this metric.

Acceptance Criteria

Give ideas about how to fix these two metrics
Reproduce the scenario using an iqe component tests
- IQE reproducer in swatch-billable-usage (swatch_billable_usage_total)
- IQE reproducer in swatch-producer-aws (swatch_producer_metered_total)
- IQE reproducer in swatch-producer-azure (swatch_producer_metered_total)
Fix the metrics

is related to

SWATCH-3571 Spike: Investigate and Verify Prometheus Metric Accuracy for Metering

Closed

relates to

SWATCH-3648 Spike: Investigate the accuracy of the PAYG metrics for alerting

Backlog

Assignee:: Unassigned

Reporter:: Jose Carvajal Hilario

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/06/03 5:53 AM

Updated:: 2025/08/15 2:46 PM

Details

Description

Acceptance Criteria

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide