-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
False
-
None
-
False
-
-
Related to a component regression we found that looked like we had no clear test to catch, sample runs:
All three runs show a pattern. The actual test failures look unpredictable, some tests are passing at the same time, others fail to talk to the apiserver.
The pattern we see is 1 or more tests failing right at the start of e2e testing, disruption, etcd log messages indicating slowness, and etcd leadership state changes.
Because the tests are unpredictable, we'd like a test that catches this symptom. We think the safest way to do this is to look for disruption within x minutes of the first e2e test.
This would be implemented as a monitortest, likely somewhere around here: https://github.com/openshift/origin/blob/master/pkg/monitortests/kubeapiserver/legacykubeapiservermonitortests/monitortest.go
Although it would be reasonable to add a new monitortest in the parent package above this level.
The test would need to do the following:
- scan final intervals for the earliest interval with source=SourceE2ETest (constant in monitorapi/types.go), save it's start time
- scan final intervals for those with source=SourceDisruption, and reason=DisruptionBegan, and a backend matching one of the apiservers (kube, openshift, oauth)
- flake the test (return a failure junit result + a success junit result) if we see any SourceDisruption intervals within X minutes of that first e2e test.
- Choose X based on what we see in the above links.