Details
-
Bug
-
Resolution: Obsolete
-
Major
-
None
-
None
-
None
-
False
-
False
-
Undefined
-
0
Description
Hello team,
We've been recently seeing this alert on the app-sre alerts channel:
https://coreos.slack.com/archives/CDW0S85QU/p1601294118096600
As you can see, the alert is quite spammy and resets very easily.
I would like to request two actions here from the Quay team:
- Please review tune the alert according to your desired SLI's on the builders
- Please provide the SRE some steps on how to debug and attempt to fix such a scenario. For example, what dashboard should one look at? Should we go check EC2 instances? How can we fix this problem?
Please feel free to let me know if more info is needed
Until then, I have downgraded this alert to `medium` severity, which means it won't show up in our alerts channel, but instead in #sd-app-sre-quay-info
It is in the best interest of all our tenants that we keep our alerting channel very high signal and low noise.
Happy monitoring!