Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-8478

TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job


    • Critical
    • No
    • Proposed
    • False
    • Hide


    • N/A
    • Release Note Not Required

      The cluster-kube-apiserver-operator CI has been constantly failing for the past week and more specifically the e2e-gcp-operator job because the test cluster ends in a state where a lot of requests start failing with "Unauthorized" errors.

      This caused multiple operators to become degraded and tests to fail.


      Looking at the failures and a must-gather we were able to capture inside of a test cluster, it turned out that the service account issuer could be the culprit here. Because of that we opened https://issues.redhat.com/browse/API-1549.

      However, it turned that disabling TestServiceAccountIssuer didn't resolve the issue and the cluster was still too unstable for the tests to pass.

      In a separate attempt we also tried disabling TestBoundTokenSignerController and this time the tests were passing. However, the cluster was still very unstable during the e2e run and the kube-apiserver-operator went degraded a couple of times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1455/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1632871645171421184/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5cf9d4569-m2spq_kube-apiserver-operator.log.

      On top of that instead of seeing Unauthorized errors, we are now seeing a lot of connection refused.

            dgrisonn@redhat.com Damien Grisonnet
            dgrisonn@redhat.com Damien Grisonnet
            Rahul Gangwar Rahul Gangwar
            0 Vote for this issue
            7 Start watching this issue