Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44582

CI fails on "TestCreateClusterV2/Main/break-glass-credentials/independent_signers"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.15, 4.18
    • HyperShift
    • Moderate
    • None
    • Hypershift Sprint 263
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem

      CI is flaky because of test failures such as the following:

      {Failed  === RUN   TestCreateClusterV2/Main/break-glass-credentials/independent_signers
          control_plane_pki_operator.go:92: generating new break-glass credentials for more than one signer
          pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones
          control_plane_pki_operator.go:201: creating CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" for signer "customer-break-glass", requesting client auth usages
          control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger automatic approval of the CSR
          control_plane_pki_operator.go:218: Successfully waited for CSR "o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083" to be approved and signed in 1s
          control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access
          control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials
          control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate
          control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username
          pki.go:75: loading certificate/key pair from disk for signer sre-break-glass, use $REGENERATE_PKI to generate new ones
          control_plane_pki_operator.go:201: creating CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" for signer "sre-break-glass", requesting client auth usages
          control_plane_pki_operator.go:211: creating CSRA e2e-clusters-f8jng-example-989zr/zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi to trigger automatic approval of the CSR
          control_plane_pki_operator.go:218: Successfully waited for CSR "zpxsfoj4y5ltot4gswx1k5t01p50a3bcz3um2rtgbi" to be approved and signed in 1s
          control_plane_pki_operator.go:130: validating that the client certificate provides the appropriate access
          control_plane_pki_operator.go:116: amending the existing kubeconfig to use break-glass client certificate credentials
          control_plane_pki_operator.go:133: issuing SSR to identify the subject we are given using the client certificate
          control_plane_pki_operator.go:153: ensuring that the SSR identifies the client certificate as having system:masters power and correct username
          control_plane_pki_operator.go:96: revoking the "customer-break-glass" signer
          pki.go:75: loading certificate/key pair from disk for signer customer-break-glass, use $REGENERATE_PKI to generate new ones
          control_plane_pki_operator.go:253: creating CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to trigger signer certificate revocation
          eventually.go:100: Failed to get *v1alpha1.CertificateRevocationRequest: context deadline exceeded
          control_plane_pki_operator.go:260: Failed to wait for CRR e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 to complete in 10m0s: context deadline exceeded
          eventually.go:220: observed *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 invalid at RV 78205 after 10m0s: incorrect condition: wanted PreviousCertificatesRevoked=True, got PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.)
          control_plane_pki_operator.go:260: *v1alpha1.CertificateRevocationRequest e2e-clusters-f8jng-example-989zr/o7ult40rg5ngulsfotl96hvl0hz0lm1qu7g9x7pz083 conditions:
          control_plane_pki_operator.go:260: LeafCertificatesRegenerated=True: AsExpected(All leaf certificates are re-generated.)
          control_plane_pki_operator.go:260: PreviousCertificatesRevoked=False: WaitingForAvailable(Previous signer certificate not yet revoked.)
          control_plane_pki_operator.go:260: NewCertificatesTrusted=True: AsExpected(New signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer trusted.)
          control_plane_pki_operator.go:260: RootCertificatesRegenerated=True: AsExpected(Signer certificate e2e-clusters-f8jng-example-989zr/customer-system-admin-signer regenerated.)
          control_plane_pki_operator.go:260: SignerClassValid=True: AsExpected(Signer class "customer-break-glass" known.)
                  --- FAIL: TestCreateClusterV2/Main/break-glass-credentials/independent_signers (602.07s)
      }
      

      This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-ingress-operator/1147/pull-ci-openshift-cluster-ingress-operator-master-e2e-hypershift/1857099476942983168. Search.ci has other similar failures.

      Version-Release number of selected component (if applicable)

      I have seen this in 4.15 and master CI jobs.

      How reproducible

      Presently, search.ci shows the following stats for the past 14 days:

      pull-ci-openshift-cluster-ingress-operator-master-e2e-hypershift (all) - 47 runs, 43% failed, 5% of failures match = 2% impact
      pull-ci-openshift-machine-config-operator-master-e2e-hypershift (all) - 67 runs, 30% failed, 15% of failures match = 4% impact
      pull-ci-openshift-cluster-node-tuning-operator-master-e2e-hypershift (all) - 34 runs, 24% failed, 13% of failures match = 3% impact
      pull-ci-openshift-hypershift-main-e2e-aws (all) - 270 runs, 51% failed, 4% of failures match = 2% impact
      pull-ci-openshift-hypershift-main-e2e-aks (all) - 265 runs, 49% failed, 6% of failures match = 3% impact
      pull-ci-openshift-csi-operator-master-hypershift-aws-e2e-external (all) - 57 runs, 56% failed, 6% of failures match = 4% impact
      periodic-ci-openshift-hypershift-release-4.15-periodics-e2e-aws-ovn (all) - 104 runs, 50% failed, 4% of failures match = 2% impact
      

      Steps to Reproduce

      1. Post a PR and have bad luck.
      2. Check search.ci: https://search.dptools.openshift.org/?search=FAIL%3A+TestCreateClusterV2%2FMain%2Fbreak-glass-credentials%2Findependent_signers&maxAge=336h&context=1&type=build-log&name=hypershift&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Actual results

      CI fails.

      Expected results

      CI passes, or fails on some other test failure.

              agarcial@redhat.com Alberto Garcia Lamela
              mmasters1@redhat.com Miciah Masters
              Jie Zhao Jie Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: