-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Now that we're in a prolonged period where disruption is having issues, we can clearly see alerts are flapping in our channels.
I did some minimal investigation and found that with scraping multiple pods now, and using the avg in the queries we use, some pods are reporting a regression while others are not. I'd think this would be a very shortlived situation but it seems to be happening quite a bit. It should clear within 4 hours and perhaps that's what's happening. This can explain why the data may not match our dashboard, however it doesn't really explain why the avg would not be getting a consistent result.
Sorry I cannot provide more info just yet, someone needs to debug what's going on. Just keep in mind, the dashboard is a live bigquery query, and sippy has disruption metrics calculated in the metrics loop every few minutes, but using a cache with an expiry, and multiple pods are scraped.