-
Bug
-
Resolution: Done
-
Normal
-
None
-
5
-
False
-
-
True
-
-
The metric swatch_metrics_ingested_usage_total was introduced as part of SWATCH-2297, which aims to improve monitoring and alerting for payg usage and billing. The gist of this epic is:
- Does the amount of data ingested match the amount tallied?
- Does the tallied data match the amount remitted (after accounting for contract coverage)?
Initially, the card was written to only track Prometheus events, which appears to have just been an oversight given the broader scope of what we're trying to monitor.
The products and metrics to be filtered are dynamically inspected using the metric "swatch_metrics_ingested_usage_total".
This metric is used when processing the events in the swatch-metrics, only for the events coming from the "prometheus" event source.
Link: https://github.com/RedHatInsights/rhsm-subscriptions/blob/169b8b0d6cd935e0549be40bf37205650a0a70ba/src/main/java/org/candlepin/subscriptions/event/EventController.java#L380
And the problem is that:
- Some products like "rhel-for-x86-els-payg-addon" and "rhel-for-x86-els-payg" are coming from "rhelmeter" event source,
- For ansible, the event source is "urn:redhat:source:console:app:aap-controller-billing"
Therefore, since the event source for the above products is not prometheus, we're not counting them with the metric "swatch_metrics_ingested_usage_total".
more details/thought process: https://docs.google.com/document/d/1B_ZnGIUvsX-m-r6iIw-lU7dNOvztf2VHOL2sRfgseWE/edit?tab=t.0
Acceptance Criteria
- Capture all successfully saved events, regardless of event_source.
- Start including event_source as a label in swatch_metrics_ingested_usage_total. that we can use to have more fine-grained grafana panels. This might also be a way to identify if we've got product tags/billing_providers combos with totals from different event_sources (potential indicator of double counting going on)
- split to
-
SWATCH-3358 Include RHEL and Ansible in our grafana dashboards
-
- Closed
-
- mentioned on