Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: 4.14.0
Affects Version/s: 4.13
Component/s: kube-apiserver
Labels:
- api
- no-doc
- no-qe

Severity:
Critical
Regression:
No
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
N/A
Release Note Type:
Release Note Not Required
Target Version:

4.14.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The cluster-kube-apiserver-operator CI has been constantly failing for the past week and more specifically the e2e-gcp-operator job because the test cluster ends in a state where a lot of requests start failing with "Unauthorized" errors.

This caused multiple operators to become degraded and tests to fail.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1450/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1631333936435040256

Looking at the failures and a must-gather we were able to capture inside of a test cluster, it turned out that the service account issuer could be the culprit here. Because of that we opened https://issues.redhat.com/browse/API-1549.

However, it turned that disabling TestServiceAccountIssuer didn't resolve the issue and the cluster was still too unstable for the tests to pass.

In a separate attempt we also tried disabling TestBoundTokenSignerController and this time the tests were passing. However, the cluster was still very unstable during the e2e run and the kube-apiserver-operator went degraded a couple of times: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/1455/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-gcp-operator/1632871645171421184/artifacts/e2e-gcp-operator/gather-extra/artifacts/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-5cf9d4569-m2spq_kube-apiserver-operator.log.

On top of that instead of seeing Unauthorized errors, we are now seeing a lot of connection refused.

clones

OCPBUGS-8475 TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job

Closed

is depended on by

OCPBUGS-8475 TestBoundTokenSignerController causes unrecoverable disruption in e2e-gcp-operator CI job

Closed

links to

openshift/cluster-kube-apiserver-operator#1455: OCPBUGS-8478: Disable TestBoundTokenSignerController

RHEA-2023:5006 rpm

Assignee:: Damien Grisonnet

Reporter:: Damien Grisonnet

QA Contact:: Rahul Gangwar

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/03/07 12:29 PM

Updated:: 2024/04/29 5:05 PM

Resolved:: 2023/10/31 12:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates