-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.12.0
-
None
-
Critical
-
None
-
Rejected
-
False
-
Description of problem:
Since rebasing openshift/origin on kubernetes 1.25 (and ginkgo v2), we are seeing around ~30% of jobs fail because openshift-tests slows down by a factor of 4. Spot checking some tests, we see tests that might normally take 6s take over 1m.
We reverted the rebase once already, and the problem came back as soon as we unreverted again (see chart below). See also TRT-643 and this Slack thread
Example: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.12-e2e-aws-ovn-upgrade/1585545649024143360 from https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-aws-ovn-upgrade-4.12-micro-release-openshift-release-analysis-aggregator/1585545656603250688 09:37:34 upgrade starts 10:48:46 upgrade completes (total 1h10m - this is a typical timing) 10:50:18 conformance tests start running 12:42:11 tests are killed due to timeout (1h52m later), we only completed about half the tests in twice the amount of time (4x slower)
Version-Release number of selected component (if applicable):
4.12.0
How reproducible:
About 30% of the time
Steps to Reproduce:
1. Trigger aggregated nightly or CI jobs on the rebase
Actual results:
About 30% of jobs should take around 5h, eventually timing out
Expected results:
Additional info:
- relates to
-
TRT-643 Some e2e test steps taking > 2x as long
- Closed