-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18.0
-
No
-
False
-
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
operator conditions etcd
Significant regression detected.
Fishers Exact probability of a regression: 99.95%.
Test pass rate dropped from 97.85% to 86.36%.
Sample (being evaluated) Release: 4.18
Start Time: 2025-01-13T00:00:00Z
End Time: 2025-01-20T12:00:00Z
Success Rate: 86.36%
Successes: 19
Failures: 3
Flakes: 0
Base (historical) Release: 4.17
Start Time: 2024-09-01T00:00:00Z
End Time: 2024-10-01T23:59:59Z
Success Rate: 97.85%
Successes: 91
Failures: 2
Flakes: 0
View the test details report for additional context.
After further analysis it was found this is caused by extremely long boskos lease delays in the range of 1-2 hours, at which point the serial suite does not have enough time to complete.
Example:
https://storage.googleapis.com/test-platform-results/logs/periodic-ci-openshift-release-master-ci-4.18-e2e-azure-ovn-serial/1880322835826610176/build-log.txt
[36mINFO[0m[2025-01-17T18:43:18Z] Acquiring leases for test e2e-azure-ovn-serial: [azure-2-quota-slice]
[36mINFO[0m[2025-01-17T19:49:52Z] Acquired 1 lease(s) for azure-2-quota-slice: [centralus--azure-2-quota-slice-26]
The dashboard shows this azure-2 account is often maxed at it's 57 cluster limit (select azure-2 in the top panel): https://grafana-route-ci-grafana.apps.ci.l2s4.p1.openshiftapps.com/d/628a36ebd9ef30d67e28576a5d5201fd/boskos-dashboard?orgId=1&from=now-7d&to=now
First option would be to rebalance azure jobs across available clusters. There appears to be a tool