Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-62929

CI fails in RouteExternalCertificate tests due to missing permissions when creating routes in the BeforeEach block

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.19, 4.20, 4.21
    • Networking / router
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • Rejected
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      CI is flaky because of test failures such as the following:

      started: 0/51/127 "[sig-network][OCPFeatureGate:RouteExternalCertificate][Feature:Router][apigroup:route.openshift.io] with invalid setup the router should not support external certificate if the secret is in a different namespace [Suite:openshift/conformance/parallel]"
      
        STEP: Creating a kubernetes client @ 10/09/25 22:36:56.504
      I1009 22:36:57.023253 288492 client.go:288] configPath is now "/tmp/configfile410008110"
      I1009 22:36:57.023294 288492 client.go:363] The user is now "e2e-test-router-external-certificate-4smsm-user"
      I1009 22:36:57.023305 288492 client.go:365] Creating project "e2e-test-router-external-certificate-4smsm"
      I1009 22:36:57.500362 288492 client.go:373] Waiting on permissions in project "e2e-test-router-external-certificate-4smsm" ...
      I1009 22:36:57.606045 288492 client.go:402] DeploymentConfig capability is enabled, adding 'deployer' SA to the list of default SAs
      I1009 22:36:57.631722 288492 client.go:417] Waiting for ServiceAccount "default" to be provisioned...
      I1009 22:36:57.785322 288492 client.go:417] Waiting for ServiceAccount "builder" to be provisioned...
      I1009 22:36:57.922781 288492 client.go:417] Waiting for ServiceAccount "deployer" to be provisioned...
      I1009 22:36:58.078724 288492 client.go:427] Waiting for RoleBinding "system:image-pullers" to be provisioned...
      I1009 22:36:58.286939 288492 client.go:427] Waiting for RoleBinding "system:image-builders" to be provisioned...
      I1009 22:36:58.303441 288492 client.go:427] Waiting for RoleBinding "system:deployers" to be provisioned...
      I1009 22:36:58.579653 288492 client.go:460] Project "e2e-test-router-external-certificate-4smsm" has been fully provisioned.
        STEP: creating pod @ 10/09/25 22:36:59.005
      I1009 22:36:59.006080 288492 client.go:1023] Running 'oc --namespace=e2e-test-router-external-certificate-4smsm --kubeconfig=/tmp/configfile410008110 create -f /tmp/fixture-testdata-dir2217410147/test/extended/testdata/cmd/test/cmd/testdata/hello-openshift/hello-pod.json -n e2e-test-router-external-certificate-4smsm'
      pod/hello-openshift created
        STEP: waiting for the pod to be running @ 10/09/25 22:36:59.173
        STEP: creating service @ 10/09/25 22:37:03.244
      I1009 22:37:03.244634 288492 client.go:1023] Running 'oc --namespace=e2e-test-router-external-certificate-4smsm --kubeconfig=/tmp/configfile410008110 expose pod hello-openshift -n e2e-test-router-external-certificate-4smsm'
      service/hello-openshift exposed
        STEP: waiting for the service to become available @ 10/09/25 22:37:03.885
        STEP: Creating a TLS certificate secret @ 10/09/25 22:37:04.062
        STEP: Providing router service account permissions to get,list,watch the secret @ 10/09/25 22:37:04.08
        STEP: Creating multiple routes referencing same external certificate @ 10/09/25 22:37:06.003
        [FAILED] in [BeforeEach] - github.com/openshift/origin/test/extended/router/external_certificate.go:214 @ 10/09/25 22:37:06.027
        STEP: Collecting events from namespace "e2e-test-router-external-certificate-4smsm". @ 10/09/25 22:37:06.028
        STEP: Found 7 events. @ 10/09/25 22:37:06.072
      I1009 22:37:06.072748 288492 dump.go:53] At 0001-01-01 00:00:00 +0000 UTC - event for hello-openshift: { } Scheduled: Successfully assigned e2e-test-router-external-certificate-4smsm/hello-openshift to ci-op-zm9swcry-4cd5c-vpqfk-worker-0-wwqf9
      I1009 22:37:06.072766 288492 dump.go:53] At 0001-01-01 00:00:00 +0000 UTC - event for hello-openshift: { } ClusterIPNotAllocated: Cluster IP [IPv4]: 172.30.219.10 is not allocated; repairing
      I1009 22:37:06.072776 288492 dump.go:53] At 0001-01-01 00:00:00 +0000 UTC - event for hello-openshift: { } ClusterIPNotAllocated: Cluster IP [IPv4]: 172.30.219.10 is not allocated; repairing
      I1009 22:37:06.072785 288492 dump.go:53] At 2025-10-09 22:37:01 +0000 UTC - event for hello-openshift: {multus } AddedInterface: Add eth0 [10.129.3.0/23] from ovn-kubernetes
      I1009 22:37:06.072793 288492 dump.go:53] At 2025-10-09 22:37:01 +0000 UTC - event for hello-openshift: {kubelet ci-op-zm9swcry-4cd5c-vpqfk-worker-0-wwqf9} Pulled: Container image "quay.io/openshift/community-e2e-images:e2e-1-registry-k8s-io-e2e-test-images-agnhost-2-53-S5hiptYgC5MyFXZH" already present on machine
      I1009 22:37:06.072802 288492 dump.go:53] At 2025-10-09 22:37:01 +0000 UTC - event for hello-openshift: {kubelet ci-op-zm9swcry-4cd5c-vpqfk-worker-0-wwqf9} Created: Created container: hello-openshift
      I1009 22:37:06.072811 288492 dump.go:53] At 2025-10-09 22:37:01 +0000 UTC - event for hello-openshift: {kubelet ci-op-zm9swcry-4cd5c-vpqfk-worker-0-wwqf9} Started: Started container hello-openshift
      I1009 22:37:06.573021 288492 resource.go:168] POD              NODE                                       PHASE    GRACE  CONDITIONS
      I1009 22:37:06.573108 288492 resource.go:175] hello-openshift  ci-op-zm9swcry-4cd5c-vpqfk-worker-0-wwqf9  Running         [{PodReadyToStartContainers 0 True 0001-01-01 00:00:00 +0000 UTC 2025-10-09 22:37:02 +0000 UTC  } {Initialized 0 True 0001-01-01 00:00:00 +0000 UTC 2025-10-09 22:36:59 +0000 UTC  } {Ready 0 True 0001-01-01 00:00:00 +0000 UTC 2025-10-09 22:37:02 +0000 UTC  } {ContainersReady 0 True 0001-01-01 00:00:00 +0000 UTC 2025-10-09 22:37:02 +0000 UTC  } {PodScheduled 0 True 0001-01-01 00:00:00 +0000 UTC 2025-10-09 22:36:59 +0000 UTC  }]
      I1009 22:37:06.573132 288492 resource.go:178] 
      I1009 22:37:06.766476 288492 dump.go:81] skipping dumping cluster info - cluster too large
      I1009 22:37:06.833762 288492 client.go:676] Deleted {user.openshift.io/v1, Resource=users  e2e-test-router-external-certificate-4smsm-user}, err: <nil>
      I1009 22:37:06.895583 288492 client.go:676] Deleted {oauth.openshift.io/v1, Resource=oauthclients  e2e-client-e2e-test-router-external-certificate-4smsm}, err: <nil>
      I1009 22:37:06.924564 288492 client.go:676] Deleted {oauth.openshift.io/v1, Resource=oauthaccesstokens  sha256~bq7PIOxtIGGmhNS2NLCplcUSkNCKB0SE4fQBQ0JGfXg}, err: <nil>
        STEP: Destroying namespace "e2e-test-router-external-certificate-4smsm" for this suite. @ 10/09/25 22:37:06.924
      
      fail [github.com/openshift/origin/test/extended/router/external_certificate.go:214]: Unexpected error:
          <*errors.StatusError | 0xc00278a960>: 
          Route.route.openshift.io "route-0" is invalid: [spec.tls.externalCertificate: Forbidden: router serviceaccount does not have permission to get this secret, spec.tls.externalCertificate: Forbidden: router serviceaccount does not have permission to watch this secret, spec.tls.externalCertificate: Forbidden: router serviceaccount does not have permission to list this secret]
          {
              ErrStatus: 
                  code: 422
                  details:
                    causes:
                    - field: spec.tls.externalCertificate
                      message: 'Forbidden: router serviceaccount does not have permission to get this
                        secret'
                      reason: FieldValueForbidden
                    - field: spec.tls.externalCertificate
                      message: 'Forbidden: router serviceaccount does not have permission to watch this
                        secret'
                      reason: FieldValueForbidden
                    - field: spec.tls.externalCertificate
                      message: 'Forbidden: router serviceaccount does not have permission to list this
                        secret'
                      reason: FieldValueForbidden
                    group: route.openshift.io
                    kind: Route
                    name: route-0
                  message: 'Route.route.openshift.io "route-0" is invalid: [spec.tls.externalCertificate:
                    Forbidden: router serviceaccount does not have permission to get this secret, spec.tls.externalCertificate:
                    Forbidden: router serviceaccount does not have permission to watch this secret,
                    spec.tls.externalCertificate: Forbidden: router serviceaccount does not have permission
                    to list this secret]'
                  metadata: {}
                  reason: Invalid
                  status: Failure,
          }
      occurred
      failed: (10.5s) 2025-10-09T22:37:06 "[sig-network][OCPFeatureGate:RouteExternalCertificate][Feature:Router][apigroup:route.openshift.io] with valid setup the router should support external certificate and the secret is deleted and re-created again but RBAC permissions are dropped then routes are not reachable [Suite:openshift/conformance/parallel]"
      

      This particular failure comes from https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/29900/pull-ci-openshift-origin-main-e2e-vsphere-ovn/1976388824435003392. Search.ci has other similar failures.

      Version-Release number of selected component (if applicable)

      I have seen this in 4.19, 4.20, and 4.21 CI jobs.

      How reproducible

      Presently, search.ci shows the following stats for the past two days:

      periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-static-ovn (all) - 8 runs, 38% failed, 33% of failures match = 13% impact
      pull-ci-openshift-console-main-okd-scos-e2e-aws-ovn (all) - 65 runs, 46% failed, 7% of failures match = 3% impact
      periodic-ci-openshift-release-master-nightly-4.20-e2e-aws-ovn-upgrade-fips (all) - 70 runs, 26% failed, 6% of failures match = 1% impact
      periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-upi (all) - 7 runs, 14% failed, 200% of failures match = 29% impact
      pull-ci-openshift-origin-main-e2e-vsphere-ovn (all) - 61 runs, 33% failed, 5% of failures match = 2% impact
      periodic-ci-openshift-release-master-okd-scos-4.21-e2e-aws-ovn-upgrade (all) - 8 runs, 88% failed, 14% of failures match = 13% impact
      pull-ci-openshift-ovn-kubernetes-master-okd-scos-e2e-aws-ovn (all) - 12 runs, 58% failed, 14% of failures match = 8% impact
      periodic-ci-openshift-release-master-ci-4.20-e2e-gcp-ovn-upgrade (all) - 76 runs, 29% failed, 9% of failures match = 3% impact
      pull-ci-openshift-cluster-version-operator-main-e2e-aws-ovn-techpreview (all) - 10 runs, 50% failed, 20% of failures match = 10% impact
      pull-ci-openshift-origin-main-e2e-vsphere-ovn-upi (all) - 63 runs, 40% failed, 8% of failures match = 3% impact
      pull-ci-openshift-origin-main-okd-scos-e2e-aws-ovn (all) - 39 runs, 59% failed, 13% of failures match = 8% impact
      openshift-kubernetes-2484-nightly-4.21-e2e-vsphere-ovn (all) - 4 runs, 25% failed, 100% of failures match = 25% impact
      periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-techpreview (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
      pull-ci-openshift-cluster-etcd-operator-main-okd-scos-e2e-aws-ovn (all) - 12 runs, 50% failed, 17% of failures match = 8% impact
      pull-ci-openshift-installer-main-okd-scos-e2e-aws-ovn (all) - 16 runs, 88% failed, 7% of failures match = 6% impact
      periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-upgrade (all) - 8 runs, 75% failed, 17% of failures match = 13% impact
      periodic-ci-openshift-release-master-nightly-4.21-e2e-aws-ovn-upgrade-fips (all) - 79 runs, 77% failed, 3% of failures match = 3% impact
      periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-gcp-ovn-rt-upgrade (all) - 69 runs, 29% failed, 15% of failures match = 4% impact
      periodic-ci-openshift-release-master-ci-4.19-e2e-gcp-ovn-upgrade (all) - 14 runs, 14% failed, 50% of failures match = 7% impact
      periodic-ci-openshift-release-master-ci-4.20-e2e-azure-ovn-upgrade (all) - 61 runs, 33% failed, 10% of failures match = 3% impact
      periodic-ci-openshift-release-master-ci-4.20-upgrade-from-stable-4.19-e2e-azure-ovn-upgrade (all) - 80 runs, 38% failed, 3% of failures match = 1% impact
      pull-ci-openshift-origin-main-e2e-gcp-ovn (all) - 67 runs, 45% failed, 3% of failures match = 1% impact
      periodic-ci-openshift-multiarch-master-nightly-4.19-ocp-e2e-upgrade-aws-ovn-arm64 (all) - 7 runs, 14% failed, 100% of failures match = 14% impact
      pull-ci-openshift-hypershift-main-okd-scos-e2e-aws-ovn (all) - 71 runs, 70% failed, 12% of failures match = 8% impact
      periodic-ci-openshift-release-master-ci-4.21-upgrade-from-stable-4.20-e2e-aws-ovn-upgrade (all) - 88 runs, 39% failed, 3% of failures match = 1% impact
      periodic-ci-openshift-multiarch-master-nightly-4.20-ocp-e2e-aws-ovn-arm64 (all) - 9 runs, 11% failed, 100% of failures match = 11% impact
      pull-ci-openshift-cluster-ingress-operator-master-okd-scos-e2e-aws-ovn (all) - 4 runs, 50% failed, 50% of failures match = 25% impact
      periodic-ci-openshift-release-master-okd-scos-4.21-e2e-aws-ovn-techpreview (all) - 10 runs, 90% failed, 11% of failures match = 10% impact
      pull-ci-openshift-monitoring-plugin-main-okd-scos-e2e-aws-ovn (all) - 18 runs, 44% failed, 25% of failures match = 11% impact
      openshift-kubernetes-2484-ci-4.21-e2e-gcp-ovn-upgrade (all) - 56 runs, 34% failed, 5% of failures match = 2% impact
      periodic-ci-openshift-release-master-okd-scos-4.21-e2e-vsphere-ovn (all) - 8 runs, 75% failed, 17% of failures match = 13% impact
      pull-ci-openshift-cloud-credential-operator-master-okd-scos-e2e-aws-ovn (all) - 5 runs, 100% failed, 20% of failures match = 20% impact
      openshift-cluster-network-operator-2809-openshift-ovn-kubernetes-2774-openshift-machine-config-operator-5324-periodics-e2e-azure-aks-ovn-conformance (all) - 30 runs, 93% failed, 4% of failures match = 3% impact
      periodic-ci-openshift-release-master-nightly-4.21-e2e-vsphere-ovn-upi-multi-vcenter (all) - 2 runs, 50% failed, 100% of failures match = 50% impact
      periodic-ci-openshift-multiarch-master-nightly-4.19-ocp-e2e-ovn-remote-libvirt-s390x (all) - 7 runs, 100% failed, 14% of failures match = 14% impact
      pull-ci-openshift-operator-framework-operator-controller-main-okd-scos-e2e-aws-ovn (all) - 30 runs, 47% failed, 7% of failures match = 3% impact
      periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn (all) - 6 runs, 67% failed, 25% of failures match = 17% impact
      periodic-ci-openshift-release-master-nightly-4.19-e2e-vsphere-ovn-upi (all) - 7 runs, 29% failed, 50% of failures match = 14% impact
      periodic-ci-openshift-release-master-okd-scos-4.20-e2e-aws-ovn-techpreview (all) - 8 runs, 88% failed, 14% of failures match = 13% impact
      periodic-ci-openshift-release-master-okd-scos-4.20-e2e-vsphere-ovn (all) - 7 runs, 43% failed, 33% of failures match = 14% impact
      periodic-ci-openshift-multiarch-master-nightly-4.20-ocp-e2e-upgrade-aws-ovn-multi-a-a (all) - 8 runs, 13% failed, 100% of failures match = 13% impact
      

      Steps to Reproduce

      1. Post a PR and have bad luck.
      2. Check search.ci: https://search.dptools.openshift.org/?search=fail+%5C%5Bgithub%5C.com%2Fopenshift%2Forigin%2Ftest%2Fextended%2Frouter%2Fexternal_certificate%5C.go%3A214%5C%5D%3A+Unexpected+error%3A&maxAge=48h&context=1&type=junit&name=&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job

      Actual results

      CI fails.

      Expected results

      CI passes, or fails on some other test failure.

      Additional info

      My search pattern in my search.ci link has a line number in it, so any code change that changes the line number of the failing assertion will invalidate the search. However, because the assertion is in a BeforeEach block, it affects multiple tests, which is why I didn't use a specific test name in the search.

              btofelrh Brett Tofel
              mmasters1@redhat.com Miciah Masters
              None
              None
              Hongan Li Hongan Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: