-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.15, 4.18
Description of problem
CI is flaky because of test failures such as the following:
{Failed === RUN TestCreateClusterV2/Main/break-glass-credentials/independent_signers control_plane_pki_operator.go:92: generating new break-glass credentials for more than one signer pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones control_plane_pki_operator.go:201: creating CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" for signer "customer-break-glass", requesting client auth usages control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger automatic approval of the CSR control_plane_pki_operator.go:218: Successfully waited for CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" to be approved and signed in 1s control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username pki.go:75: loading certificate/key pair from disk for signer sre-break-glass, use $REGENERATE_PKI to generate new ones control_plane_pki_operator.go:201: creating CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" for signer "sre-break-glass", requesting client auth usages control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi to trigger automatic approval of the CSR control_plane_pki_operator.go:218: Successfully waited for CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" to be approved and signed in 1s control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username control_plane_pki_operator.go:96: revoking the "customer-break-glass" signer pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones control_plane_pki_operator.go:253: creating CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger signer certificate revocation eventually.go:100: Failed to get *v1alpha1.CertificateRevocationRequest: context deadline exceeded control_plane_pki_operator.go:260: Failed to wait for CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to complete in 10m0s: context deadline exceeded eventually.go:220: observed *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 invalid at RV 78205 after 10m0s: incorrect condition: wanted PreviousCertificatesRevoked=True, got PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.) control_plane_pki_operator.go:260: *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 conditions: control_plane_pki_operator.go:260: LeafCertificatesRegenerated=True: AsExpected(All leaf certificates are re-generated.) control_plane_pki_operator.go:260: PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.) control_plane_pki_operator.go:260: NewCertificatesTrusted=True: AsExpected(New signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer trusted.) control_plane_pki_operator.go:260: RootCertificatesRegenerated=True: AsExpected(Signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer regenerated.) control_plane_pki_operator.go:260: SignerClassValid=True: AsExpected(Signer class "customer-break-glass" known.) --- FAIL: TestCreateClusterV2/Main/break-glass-credentials/independent_signers (602.07s) }
This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1147/pull-ci-openshift-cluster-ingress-operator-master-e2e-hypershift/1857099476942983168. Search.ci has other similar failures.
Version-Release number of selected component (if applicable)
I have seen this in 4.15 and master CI jobs.
How reproducible
Presently, search.ci shows the following stats for the past 14 days:
pull-ci-openshift-cluster-ingress-operator-master-e2e-hypershift (all) - 47 runs, 43% failed, 5% of failures match = 2% impact pull-ci-openshift-machine-config-operator-master-e2e-hypershift (all) - 67 runs, 30% failed, 15% of failures match = 4% impact pull-ci-openshift-cluster-node-tuning-operator-master-e2e-hypershift (all) - 34 runs, 24% failed, 13% of failures match = 3% impact pull-ci-openshift-hypershift-main-e2e-aws (all) - 270 runs, 51% failed, 4% of failures match = 2% impact pull-ci-openshift-hypershift-main-e2e-aks (all) - 265 runs, 49% failed, 6% of failures match = 3% impact pull-ci-openshift-csi-operator-master-hypershift-aws-e2e-external (all) - 57 runs, 56% failed, 6% of failures match = 4% impact periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn (all) - 104 runs, 50% failed, 4% of failures match = 2% impact
Steps to Reproduce
1. Post a PR and have bad luck.
2. Check search.ci: https://search.dptools.openshift.org/?search=FAIL%3A+TestCreateClusterV2%2FMain%2Fbreak-glass-credentials%2Findependent_signers&maxAge=336h&context=1&type=build-log&name=hypershift&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job
Actual results
CI fails.
Expected results
CI passes, or fails on some other test failure.