-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Future Sustainability
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
We are struggling with failure patterns where 20-60 tests fail in a set of job runs.
Examples:
- upgrade failure can lead to 20+ failed tests: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade/1997663581734178816
- install failure can lead to 7+ failed tests: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-static-ovn/1998074217861484544
- some bugs can cause mass failures like OCPBUGS-66420: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips/1998279551385341952
Many problems hit multiple variant combos at once, meaning you're quickly sorting through hundreds of component readiness regressions.
Would it be feasible to rely on choosing one test which if it fails, overrides all other failures in that run.
For example if "upgrade: [sig-cluster-lifecycle] Cluster completes upgrade" fails, that is the only one component readiness will show a regression for, all other test failures in those runs do not trigger regressions.
Or assume a "should not have mass e2e test failures" monitortest, if it fails, all of those regressions do not count and only this one would go regressed.
This would reduce the granularity of regressing the right component, but that's virtually unheard of in these situations regardless other than perhaps sometimes the right operator showing regressed for an install failure. (along with several other components) The overhead of getting the right bug to the right team would be likely less than the overhead of what we're sorting through today.
Additional thoughts:
- if we could obtain the count of failed tests per job run and display it on test details reports with a different shade of red, that would be helpful
- if the regression tracker stored the set of job runs observed for a regression, we could enhance the tooling that tries to tie regressions to an existing triage record. The current mechanism is helpful at times, but often difficult to trust without deep inspection. Tying on actual job runs would add a layer of confidence.