-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Disruption Enhancements
-
False
-
None
-
False
-
Not Selected
-
To Do
-
6% To Do, 0% In Progress, 94% Done
Today we manually catch some regressions by eyeballing disruption graphs.
There are two focuses for this Epic, first are updates to the existing disruption logic to fix and tune the existing logic and second is to consider new methods for collecting and analyzing disruption.
For the second part considerations are:
Design some automation to detect these kinds of regressions and alert TRT.
Would this be in sippy or something new? (too openshift specific?)
Bear in mind we'll soon need it for component memory and cpu usage as well.
Alerts should eventually be targeting the SLO we discussed in Infra arch call on Feb 7: https://docs.google.com/document/d/1QOXh7Me0w-4ad-c8HaTuQPvpG5cddUyA2b1j00H-MXQ/edit?usp=sharing
Make sure we gain testing over metal and vsphere which typically do not have the min 100 runs, how can we test these broader?