Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-787

Design automation for disruption regression in BigQuery

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • Disruption Enhancements
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 6% To Do, 0% In Progress, 94% Done

      Today we manually catch some regressions by eyeballing disruption graphs. 

      There are two focuses for this Epic, first are updates to the existing disruption logic to fix and tune the existing logic and second is to consider new methods for collecting and analyzing disruption.

       

      For the second part considerations are:

      Design some automation to detect these kinds of regressions and alert TRT.

      Would this be in sippy or something new? (too openshift specific?)

      Bear in mind we'll soon need it for component memory and cpu usage as well.

      Alerts should eventually be targeting the SLO we discussed in Infra arch call on Feb 7: https://docs.google.com/document/d/1QOXh7Me0w-4ad-c8HaTuQPvpG5cddUyA2b1j00H-MXQ/edit?usp=sharing

      Make sure we gain testing over metal and vsphere which typically do not have the min 100 runs, how can we test these broader?

            rh-ee-dmistry Deep Mistry
            rhn-engineering-dgoodwin Devan Goodwin
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: