-
Epic
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
None
-
payg-monitoring-alerting-improvements
-
False
-
-
True
-
To Do
-
59% To Do, 24% In Progress, 18% Done
In order to quickly and confidently assess the health of the end-to-end metering flows, we'll add more fine-grained metering-related metrics, and a dashboard to visualize some aggregated information.
The metrics and dashboard should show quickly at a product/metric level:
- How much of a given usage has been collected from Prometheus (metered)?
- How much usage was tallied?
- How much was processed into "billable usage"?
- How much was covered by a contract?
- How much was remitted to the marketplace APIs?
- How much usage failed remittance?
- A critical alert to swatch-dev team if there is a usage remittance (telemetry usage - tallied usage) decrease by x % for x number of orgs? A single email with those orgs and to show things are getting underbilled. We can start with daily reports/email first?
- A blocker alert to swatch-dev team if there is a usage remittance (telemetry usage - tallied usage) increase by x % for x number of orgs? A single email with those orgs and to show things are getting overbilled. We can start with daily reports/email first?
- relates to
-
SWATCH-2173 Design resilience document for handling failures in different Marketplace Billable Usage
- Closed