-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
subs-swatch-lightning
-
-
We need to implement a Splunk alert that notifies us when the swatch-utilization service tries to process a message with an unsupported product ID or metric ID.
Currently, if utilization messages arrive with invalid or unsupported product/metric IDs, they silently fail to process without any proactive notification. This could indicate a configuration gap where new products or metrics aren't properly registered, or it could point to upstream data issues. Without this alert, we might not realize that customer utilization data is being dropped until users report missing data.
The proposed alert uses a Splunk query to detect log patterns indicating invalid product IDs or unsupported/invalid metric IDs, then groups them by error type and the specific IDs involved.
Draft query
index=rh_rhsm namespace=rhsm-prod source="swatch-utilization" ("invalid productId" OR "unsupported metricId" OR "invalid metricId") | eval error_type = case( match(_raw, "invalid productId"), "Invalid Product ID", match(_raw, "unsupported metricId"), "Unsupported Metric ID", match(_raw, "invalid metricId"), "Invalid Metric ID", true(), "Other" ) | rex field=message "productId '(?<product_id>[^']+)'" | rex field=message "metricId '(?<metric_id>[^']+)'" | stats count by error_type, product_id, metric_id | sort -count
Acceptance Criteria
- Create the Splunk alert in the RHSM Splunk workspace
- Configure alert to trigger when count > 0 over a defined time window
- In app-interface, update the subscription-usage-notifications.md in docs/console.redhat.com/app-sops/rhsm/subscription-usage-notifications.md runbook with the new alert