-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.20.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
s390x
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Authentication operator is failing for few hours post cluster install.
I can see authentication operator is having some problem with healthz api because of which operator Available status is flapping from True to False. which is causing oc login failure
we are observing this on few of ours servers from 4.20.0-ec.6 build and observed on 4.20.0-rc.0 & 4.20.0-rc.1.
I am suspecting this PR is causing this issue https://s390x.ocp.releases.ci.openshift.org/releasestream/4-dev-preview-s390x/release/4.20.0-ec.6
AUTH-543: OIDC/OAuth resource configuration #740
https://github.com/openshift/cluster-authentication-operator/pull/740
Steps to reproduce:-
Install ocp cluster on IBM Z .
Observation:-
I tried to check auth operator in a loop and i could see this Available True to False and False to True with a minute.
Mon Sep 15 04:28:24 AM EDT 2025 authentication 4.20.0-rc.1 True False False 0s NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:27 AM EDT 2025 authentication 4.20.0-rc.1 True False False 1s NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:29 AM EDT 2025 authentication 4.20.0-rc.1 True False False 0s NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:31 AM EDT 2025 authentication 4.20.0-rc.1 False False False 0s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:34 AM EDT 2025 authentication 4.20.0-rc.1 False False False 1s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:36 AM EDT 2025 authentication 4.20.0-rc.1 False False False 0s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:38 AM EDT 2025 authentication 4.20.0-rc.1 False False False 0s OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-rc.1 True False 3h29m Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available Mon Sep 15 04:28:40 AM EDT 2025 authentication 4.20.0-rc.1 True False False 1s
Checked the operator logs
Using project "openshift-authentication-operator" on server "https://api.pok-130.ocptest.pok.stglabs.ibm.com:6443". [root@bastion ~]# la Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+ NAME READY STATUS RESTARTS AGE pod/authentication-operator-68f9b78996-wcct2 1/1 Running 0 2d2h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/metrics ClusterIP 172.*.*.123 <none> 443/TCP 2d2h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/authentication-operator 1/1 1 1 2d2h NAME DESIRED CURRENT READY AGE replicaset.apps/authentication-operator-68f9b78996 1 1 1 2d2h
Redirected logs to file
llo pod/authentication-operator-68f9b78996-wcct2 > authpod.log
1st Error and last error observed in operator pod logs
I0915 12:35:03.701899 1 status_controller.go:230] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2025-09-15T04:48:27Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2025-09-15T04:59:24Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2025-09-15T12:35:03Z","message":"OAuthServerRouteEndpointAccessibleControllerAvailable: Get \"https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz\": EOF","reason":"OAuthServerRouteEndpointAccessibleController_EndpointUnavailable","status":"False","type":"Available"},{"lastTransitionTime":"2025-09-15T04:33:55Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"},{"lastTransitionTime":"2025-09-15T04:33:54Z","reason":"NoData","status":"Unknown","type":"EvaluationConditionsDetected"}]}} E0915 12:35:03.734011 1 base_controller.go:279] "Unhandled Error" err="OAuthServerRouteEndpointAccessibleController reconciliation failed: Get \"https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz\": EOF" grep -i '"Unhandled Error"' authpod.log |head -n 1 E0915 11:06:44.367223 1 base_controller.go:279] "Unhandled Error" err="StatusSyncer_authentication reconciliation failed: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"authentication\": the object has been modified; please apply your changes to the latest version and try again" [root@bastion ~]# grep -i '"Unhandled Error"' authpod.log |tail -n 1 E0915 12:35:04.099102 1 base_controller.go:279] "Unhandled Error" err="StatusSyncer_authentication reconciliation failed: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"authentication\": the object has been modified; please apply your changes to the latest version and try again"
All other operators shows Available True since 50 hrs but authentication operator shows 42 hrs
[root@bastion ~]# lco |head -n 5 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.20.0-rc.1 True False False 42h baremetal 4.20.0-rc.1 True False False 2d2h cloud-controller-manager 4.20.0-rc.1 True False False 2d2h cloud-credential 4.20.0-rc.1 True False False 2d2h