-
Task
-
Resolution: Won't Do
-
Undefined
-
None
-
None
-
Future Sustainability
-
False
-
-
False
-
-
Description
This ticket was created as a follow up for itn-2025-00118
The Prometheus HTTP Blackbox probe results affect the Error burn rate very much.
During above incident the probes sometimes failed because of the load on the default router with a 502/503. This has led to a 200% Error Budget burn overall, though the HTTP health was flapping, and UI for other tenants was mostly working.
We should consider changing the calculation so that the burn rate and error budget are less affected by a temporary failure of the healthcheck.