The cluster-kube-apiserver-operator CI has been constantly failing for the past week and more specifically the e2e-gcp-operator job because the test cluster ends in a state where a lot of requests start failing with "Unauthorized" errors.
This caused multiple operators to become degraded and tests to fail.
Looking at the failures and a must-gather we were able to capture inside of a test cluster, it turned out that the service account issuer could be the culprit here. Because of that we opened https://issues.redhat.com/browse/API-1549.
However, it turned that disabling TestServiceAccountIssuer didn't resolve the issue and the cluster was still too unstable for the tests to pass.
In a separate attempt we also tried disabling TestBoundTokenSignerController and this time the tests were passing. However, the cluster was still very unstable during the e2e run and the kube-apiserver-operator went degraded a couple of times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1455/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1632871645171421184/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5cf9d4569-m2spq_kube-apiserver-operator.log.
On top of that instead of seeing Unauthorized errors, we are now seeing a lot of connection refused.
- clones
-
OCPBUGS-8475 TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job
- Closed
- is depended on by
-
OCPBUGS-8475 TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job
- Closed
- links to
-
RHEA-2023:5006 rpm