-
Story
-
Resolution: Done
-
Normal
-
None
-
Product / Portfolio Work
-
False
-
-
False
-
None
-
Unset
-
None
-
-
-
As a service owner, we don’t need to alert on every 400-level error—many 4xx responses are expected (e.g., user entered wrong password, client sent invalid parameters, etc.). We should alert only on the ones that usually mean the service is broken, misconfigured, or being abused but can capture all 4xx errors in a graph but not alert on them (we do have this already in our main dashboards but its not great – need to fix it)
4xx Codes we should alert on
401 – Unauthorized to potentially uncover auth provider outage, token validation failures, all clients suddenly failing (original reason for this alert)
403 – Forbidden (unexpected spike) to uncover bad service/config changes, internal client credentials between services failing
404 – Not Found to incidicate possibly bad routing issues
408 – Request Timeout to uncover any possible network issues
409 – Conflict for maybe rare consistency issues
429 – Too Many Requests if there is any rate limiting happening
All other 400's should be captured in graph but not alerted on