-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.13, 4.14
-
Moderate
-
No
-
Sprint 244, Sprint 245, Sprint 246, Sprint 247, Sprint 248, Sprint 249, Sprint 250, Sprint 251, Sprint 252, Sprint 253
-
10
-
Rejected
-
False
-
Description of problem
CI is flaky because of test failures such as the following:
TestAll/parallel/TestManagedDNSToUnmanagedDNSIngressController === RUN TestAll/parallel/TestManagedDNSToUnmanagedDNSIngressController util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:106: retrying client call due to: Get "http://168.61.75.99": context deadline exceeded (Client.Timeout exceeded while awaiting headers) util_test.go:551: verified connectivity with workload with req http://168.61.75.99 and response 200 unmanaged_dns_test.go:148: Updating ingresscontroller managed-migrated to dnsManagementPolicy=Unmanaged unmanaged_dns_test.go:161: Waiting for stable conditions on ingresscontroller managed-migrated after dnsManagementPolicy=Unmanaged unmanaged_dns_test.go:177: verifying conditions on DNSRecord zone {ID:/subscriptions/d38f1e38-4bed-438e-b227-833f997adf6a/resourceGroups/ci-op-k8s8zfit-04a70-rdnbw-rg/providers/Microsoft.Network/privateDnsZo nes/ci-op-k8s8zfit-04a70.ci.azure.devcluster.openshift.com Tags:map[]} unmanaged_dns_test.go:177: DNSRecord zone expected to have status=Unknown but got status=True panic.go:522: deleted ingresscontroller managed-migrated
This particular failure comes from https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/970/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1690101593501863936. Search.ci has other similar failures.
Version-Release number of selected component (if applicable)
I have seen this in recent 4.14 CI job runs. I also found a failure from February 2023, which precedes the 4.13 branch cut in March 2023, which means these failures go back at least to 4.13: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/874/pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator/1626100610514292736
How reproducible
Presently, search.ci shows the following stats for the past 14 days:
Found in 6.98% of runs (14.29% of failures) across 43 total runs and 1 jobs (48.84% failed) pull-ci-openshift-cluster-ingress-operator-master-e2e-azure-operator (all) - 43 runs, 49% failed, 14% of failures match = 7% impact
Steps to Reproduce
1. Post a PR and have bad luck.
2. Check search.ci using the link above.
Actual results
CI fails.
Expected results
CI passes, or fails on some other test failure.