-
Bug
-
Resolution: Unresolved
-
Minor
-
4.20
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
Rejected
-
NI&D Sprint 275, NI&D Sprint 276
-
2
-
Done
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem
CI is flaky because of test failures such as the following:
=== RUN TestAll/serial/TestGatewayAPI/testGatewayAPIResourcesProtection/Pod_binding_required gateway_api_test.go:401: failed to verify VAP protection for creating gateway API CRD "gateways.gateway.networking.k8s.io": unexpected error received while creating CRD "gateways.gateway.networking.k8s.io": Post "https://api.ci-op-f2l3wlbj-43abb.origin-ci-int-aws.dev.rhcloud.com:6443/apis/apiextensions.k8s.io/v1/customresourcedefinitions": read tcp 10.128.220.16:47032->3.143.193.91:6443: read: connection reset by peer
This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1257/pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator/1954837470486990848. Search.ci has other similar failures.
Version-Release number of selected component (if applicable)
I have seen this in 4.20 CI jobs.
How reproducible
Presently, search.ci shows the following stats for the past two days:
pull-ci-openshift-cluster-ingress-operator-master-e2e-aws-operator (all) - 33 runs, 39% failed, 15% of failures match = 6% impact
Steps to Reproduce
1. Post a PR and have bad luck.
2. Check search.ci.
Actual results
CI fails.
Expected results
CI passes, or fails on some other test failure.
Additional info
The test might just need a retry for the "Verify that GatewayAPI CRD creation is forbidden" step. This failure comes from this code:
// Verify that GatewayAPI CRD creation is forbidden. for i := range testCRDs { if err := wait.PollUntilContextTimeout(context.Background(), 2*time.Second, 30*time.Second, false, func(ctx context.Context) (bool, error) { if err := tc.kclient.Create(ctx, testCRDs[i]); err != nil { if kerrors.IsAlreadyExists(err) { // VAP was disabled and re-enabled at the beginning of the test. // It may take some time for the API server to process this change and register the VAP. // As a result, we might encounter a "CRD X already exists" error. // To handle this, we allow the API server some time to catch up. t.Logf("Failed to create CRD %q: %v; retrying...", testCRDs[i].Name, err) return false, nil } if !strings.Contains(err.Error(), tc.expectedErrMsg) { return false, fmt.Errorf("unexpected error received while creating CRD %q: %v", testCRDs[i].Name, err) } return true, nil } return false, fmt.Errorf("admission error is expected while creating CRD %q but not received", testCRDs[i].Name) }); err != nil { t.Errorf("failed to verify VAP protection for creating gateway API CRD %q: %v", testCRDs[i].Name, err) }
Note that the polling loop did not hit a timeout; rather, it exited when it received an unexpected connection-reset error from the API server:
if !strings.Contains(err.Error(), tc.expectedErrMsg) { return false, fmt.Errorf("unexpected error received while creating CRD %q: %v", testCRDs[i].Name, err)
This error check could be changed just to log and retry for connection errors in order to make the test more resilient in the case of transient API-server errors.
- is cloned by
-
OCPBUGS-60620 CI fails on TestGatewayAPI/testGatewayAPIObjects
-
- Verified
-
- links to