Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-61854

Authentication operator is failing for few hours post cluster install.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.20.z
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • s390x
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Authentication operator is failing for few hours post cluster install.

      I can see authentication operator is having some problem with healthz api because of which operator Available status is flapping from True to False. which is causing oc login failure

      we are observing this on few of ours servers from 4.20.0-ec.6 build and observed on 4.20.0-rc.0 & 4.20.0-rc.1.

      I am suspecting this PR is causing this issue https://s390x.ocp.releases.ci.openshift.org/releasestream/4-dev-preview-s390x/release/4.20.0-ec.6 
      AUTH-543: OIDC/OAuth resource configuration #740
      https://github.com/openshift/cluster-authentication-operator/pull/740

      Steps to reproduce:-
      Install ocp cluster on IBM Z .

      Observation:- 
      I tried to check auth operator in a loop and i could see this Available True to False and False to True with a minute.

      Mon Sep 15 04:28:24 AM EDT 2025
      authentication                             4.20.0-rc.1   True        False         False      0s      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:27 AM EDT 2025
      authentication                             4.20.0-rc.1   True        False         False      1s      
      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:29 AM EDT 2025
      authentication                             4.20.0-rc.1   True        False         False      0s      
       
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:31 AM EDT 2025
      
      authentication                             4.20.0-rc.1   False       False         False      0s      OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF
      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:34 AM EDT 2025
      
      authentication                             4.20.0-rc.1   False       False         False      1s      OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF
      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:36 AM EDT 2025
      
      authentication                             4.20.0-rc.1   False       False         False      0s      OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF
      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:38 AM EDT 2025
      authentication                             4.20.0-rc.1   False       False         False      0s      OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz": EOF
      
      
      NAME      VERSION       AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-rc.1   True        False         3h29m   Error while reconciling 4.20.0-rc.1: the cluster operator authentication is not available
      Mon Sep 15 04:28:40 AM EDT 2025
      authentication                             4.20.0-rc.1   True        False         False      1s    
      

      Checked the operator logs

      Using project "openshift-authentication-operator" on server "https://api.pok-130.ocptest.pok.stglabs.ibm.com:6443".
      [root@bastion ~]# la
      Warning: apps.openshift.io/v1 DeploymentConfig is deprecated in v4.14+, unavailable in v4.10000+
      NAME                                           READY   STATUS    RESTARTS   AGE
      pod/authentication-operator-68f9b78996-wcct2   1/1     Running   0          2d2h
      
      NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
      service/metrics   ClusterIP   172.*.*.123   <none>        443/TCP   2d2h
      
      NAME                                      READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/authentication-operator   1/1     1            1           2d2h
      
      NAME                                                 DESIRED   CURRENT   READY   AGE
      replicaset.apps/authentication-operator-68f9b78996   1         1         1       2d2h
      

      Redirected logs to file
      llo pod/authentication-operator-68f9b78996-wcct2 > authpod.log

      1st Error and last error observed in operator pod logs

      I0915 12:35:03.701899       1 status_controller.go:230] clusteroperator/authentication diff {"status":{"conditions":[{"lastTransitionTime":"2025-09-15T04:48:27Z","message":"All is well","reason":"AsExpected","status":"False","type":"Degraded"},{"lastTransitionTime":"2025-09-15T04:59:24Z","message":"AuthenticatorCertKeyProgressing: All is well","reason":"AsExpected","status":"False","type":"Progressing"},{"lastTransitionTime":"2025-09-15T12:35:03Z","message":"OAuthServerRouteEndpointAccessibleControllerAvailable: Get \"https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz\": EOF","reason":"OAuthServerRouteEndpointAccessibleController_EndpointUnavailable","status":"False","type":"Available"},{"lastTransitionTime":"2025-09-15T04:33:55Z","message":"All is well","reason":"AsExpected","status":"True","type":"Upgradeable"},{"lastTransitionTime":"2025-09-15T04:33:54Z","reason":"NoData","status":"Unknown","type":"EvaluationConditionsDetected"}]}}
      E0915 12:35:03.734011       1 base_controller.go:279] "Unhandled Error" err="OAuthServerRouteEndpointAccessibleController reconciliation failed: Get \"https://oauth-openshift.apps.pok-130.ocptest.pok.stglabs.ibm.com/healthz\": EOF"
      
      grep -i '"Unhandled Error"' authpod.log |head -n 1
      E0915 11:06:44.367223       1 base_controller.go:279] "Unhandled Error" err="StatusSyncer_authentication reconciliation failed: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"authentication\": the object has been modified; please apply your changes to the latest version and try again"
      [root@bastion ~]# grep -i '"Unhandled Error"' authpod.log |tail -n 1
      E0915 12:35:04.099102       1 base_controller.go:279] "Unhandled Error" err="StatusSyncer_authentication reconciliation failed: Operation cannot be fulfilled on clusteroperators.config.openshift.io \"authentication\": the object has been modified; please apply your changes to the latest version and try again"
      

      All other operators shows Available True since 50 hrs but authentication operator shows 42 hrs

      [root@bastion ~]# lco |head -n 5
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.20.0-rc.1   True        False         False      42h     
      baremetal                                  4.20.0-rc.1   True        False         False      2d2h    
      cloud-controller-manager                   4.20.0-rc.1   True        False         False      2d2h    
      cloud-credential                           4.20.0-rc.1   True        False         False      2d2h   
      

              vsing@redhat.com Vikas Singh
              apuranda Amrut Purandare
              None
              None
              None
              None
              ocp-multi-arch-ibm-partners
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: