• Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • False
    • None
    • False

      Now that we're in a prolonged period where disruption is having issues, we can clearly see alerts are flapping in our channels.

      I did some minimal investigation and found that with scraping multiple pods now, and using the avg in the queries we use, some pods are reporting a regression while others are not. I'd think this would be a very shortlived situation but it seems to be happening quite a bit. It should clear within 4 hours and perhaps that's what's happening. This can explain why the data may not match our dashboard, however it doesn't really explain why the avg would not be getting a consistent result.

      Sorry I cannot provide more info just yet, someone needs to debug what's going on. Just keep in mind, the dashboard is a live bigquery query, and sippy has disruption metrics calculated in the metrics loop every few minutes, but using a cache with an expiry, and multiple pods are scraped.

              lmeyer@redhat.com Luke Meyer
              rhn-engineering-dgoodwin Devan Goodwin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: