Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29220

cluster install failed with azure workload identity

    XMLWordPrintable

Details

    • Important
    • No
    • CLOUD Sprint 249
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      Install cluster with azure workload identity against 4.16 nightly build, failed as some co are degraded.
      $ oc get co | grep -v "True        False         False"
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.16.0-0.nightly-2024-02-07-200316   False       False         True       153m    OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jima416a1.qe.azure.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jima416a1.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
      console                                    4.16.0-0.nightly-2024-02-07-200316   False       True          True       141m    DeploymentAvailable: 0 replicas available for console deployment...
      ingress                                                                         False       True          True       137m    The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (LoadBalancerPending: The LoadBalancer service is pending)
      
      Ingress LB public IP is pending to be created
      $ oc get svc -n openshift-ingress
      NAME                      TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
      router-default            LoadBalancer   172.30.199.169   <pending>     80:32007/TCP,443:30229/TCP   154m
      router-internal-default   ClusterIP      172.30.112.167   <none>        80/TCP,443/TCP,1936/TCP      154m
      
      
      Detected that CCM pod is CrashLoopBackOff with error
      $ oc get pod -n openshift-cloud-controller-manager
      NAME                                              READY   STATUS             RESTARTS         AGE
      azure-cloud-controller-manager-555cf5579f-hz6gl   0/1     CrashLoopBackOff   21 (2m55s ago)   160m
      azure-cloud-controller-manager-555cf5579f-xv2rn   0/1     CrashLoopBackOff   21 (15s ago)     160m
      
      error in ccm pod:
      I0208 04:40:57.141145       1 azure.go:931] Azure cloudprovider using try backoff: retries=6, exponent=1.500000, duration=6, jitter=1.000000
      I0208 04:40:57.141193       1 azure_auth.go:86] azure: using workload identity extension to retrieve access token
      I0208 04:40:57.141290       1 azure_diskclient.go:68] Azure DisksClient using API version: 2022-07-02
      I0208 04:40:57.141380       1 azure_blobclient.go:73] Azure BlobClient using API version: 2021-09-01
      F0208 04:40:57.141471       1 controllermanager.go:314] Cloud provider azure could not be initialized: could not init cloud provider azure: no token file specified. Check pod configuration or set TokenFilePath in the options

      Version-Release number of selected component (if applicable):

      4.16 nightly build    

      How reproducible:

      Always    

      Steps to Reproduce:

          1. Install cluster with azure workload identity
          2.
          3.
          

      Actual results:

          Installation failed due to some operators are degraded

      Expected results:

          Installation is successful.

      Additional info:

       

      Attachments

        Activity

          People

            joelspeed Joel Speed
            jinyunma Jinyun Ma
            Zhaohua Sun Zhaohua Sun
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

            Dates

              Created:
              Updated: