-
Story
-
Resolution: Done
-
Normal
-
None
-
Product / Portfolio Work
-
3
-
False
-
-
False
-
None
-
Unset
-
None
-
-
-
After multiple revamps of our alerting and monitoring, we've accumulated a few duplicate alerts that will lead to multiple alerts firing for a single issue. We should clean up and remove and duplicate alerts to avoid alerting fatigue. As part of this deduplication, we should favor any templated alerts over defining our own for each service. This allows us to define an alert once and use it for multiple services, simplifying management and updates to a single alert type.
Alerts to remove: | Duplicated by |
---|---|
InventoryAPIPodDown | service-down-kessel-prod-kessel-inventory-api |
ConnectPodDown | service-down-kessel-kafka-connect-prod |
RelationsAPIPodDown | service-down-kessel-prod-kessel-relations-api |
RelationsAPISinglePod | service-restart-kessel-prod-kessel-relations-api, service-down-kessel-prod-kessel-relations-api |
spiceDBPodDown | service-down-kessel-prod-kessel-relations-spicedb |
spiceDBSinglePod | service-restart-kessel-prod-spicedb, service-down-kessel-prod-kessel-relations-spicedb |
Alerts to investigate for removal | Potentially Duplicated by | Remove? |
---|---|---|
App-inventory-api-Absent | service-down-kessel-prod-kessel-inventory-api | Yes |
App-relations-api-Absent | service-down-kessel-prod-kessel-relations-api | Yes |
App-spicedb-Absent | service-down-kessel-prod-kessel-relations-spicedb | Yes |
Kessel-Inventory-Api-Availability | KesselInventoryApiAvailability (slo-doc) | TBD |
Kessel-Inventory-Api-Latency | KesselInventoryApiLatency (slo-document) | TBD |
Kessel-Relations-Api-Availability | KesselRelationsApiAvailability (slo-doc) | TBD |
Kessel-Relations-Api-LatencyUnder100ms | KesselRelationsApiLatency100ms (slo-doc) | TBD |
Kessel-Relations-Api-LatencyUnder250ms | KesselRelationsApiLatency250ms (slo-doc) | TBD |
Done Criteria
- All duplicate alerts removed
- Potential duplicates validated if duplicate and removed if they are
- Changes are replicated to FedRAMP