Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15421

GCP XPN Installs fail when authenticating with CLI

    • Important
    • No
    • 0
    • Sprint 240, Sprint 241, Sprint 242
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, shared VPC installations on GCP using passthrough credentials mode could fail because the installation program used credentials from the default service account. With this update, you can specify another service account to use for node creation instead of the default. (link:https://issues.redhat.com/browse/OCPBUGS-15421[*OCPBUGS-15421*])
      Show
      Previously, shared VPC installations on GCP using passthrough credentials mode could fail because the installation program used credentials from the default service account. With this update, you can specify another service account to use for node creation instead of the default. (link: https://issues.redhat.com/browse/OCPBUGS-15421 [* OCPBUGS-15421 *])
    • Bug Fix
    • Done

      Description of problem:

      When authenticating openshift-install with the gcloud cli, rather than using a service account key file, the installer will throw an error because https://github.com/openshift/installer/blob/master/pkg/asset/machines/gcp/machines.go#L170-L178 ALWAYS expects to extract a service account to passthrough to nodes in XPN installs. 
      
      An alternative approach would be to handle the lack of service account without error, and allow the required service accounts to passed in through another mechanism.

      Version-Release number of selected component (if applicable):

       

      How reproducible:

       

      Steps to Reproduce:

      1. Create install config for gcp xpn install
      2. Authenticate installer without service account key file (either gcloud cli auth or through a VM).
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

            [OCPBUGS-15421] GCP XPN Installs fail when authenticating with CLI

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2023:5006

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Important: OpenShift Container Platform 4.14.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2023:5006

            Jianli Wei added a comment -

            Comments indicated that the tests passed. So, should this actually be in verified state?

            sdasu@redhat.com Yes, let's move it to verified status, according to my comment and build info. Thanks! 

            Jianli Wei added a comment - Comments indicated that the tests passed. So, should this actually be in verified state? sdasu@redhat.com Yes, let's move it to verified status, according to my comment and build info . Thanks! 

            rhn-support-jiwei Yes please. 

            Brent Barbachem added a comment - rhn-support-jiwei Yes please. 

            Jianli Wei added a comment -

            Are you able to test this in manual mode? 

            padillon If in manual mode, shall I still specify a service account for control-plane in install-config? Thanks! 

            Jianli Wei added a comment - Are you able to test this in manual mode?  padillon If in manual mode, shall I still specify a service account for control-plane in install-config? Thanks! 

            rhn-support-jiwei Are you able to test this in manual mode? I believe the acceptance criteria for this bug should only be based on manual mode, the other methods would be addressed in https://issues.redhat.com//browse/OCPBUGS-17757

            Patrick Dillon added a comment - rhn-support-jiwei Are you able to test this in manual mode? I believe the acceptance criteria for this bug should only be based on manual mode, the other methods would be addressed in https://issues.redhat.com//browse/OCPBUGS-17757

            Jianli Wei added a comment -

            FYI the must-gather is available at https://drive.google.com/file/d/1NRbqGuqLBciWd2uwjTzFf3Lo1IjaGp-V/view?usp=drive_link.

            (1) the gcp auth settings

            $ gcloud auth login 
            Go to the following link in your browser:
            
                <login url>
            
            Enter authorization code: <auth code>
            
            You are now logged in as [jiwei@redhat.com].
            Your current project is [openshift-qe].  You can change this setting by running:
              $ gcloud config set project PROJECT_ID
            $ gcloud auth application-default login 
            Go to the following link in your browser:
            
                <login url>
            
            Enter authorization code: <auth code>
            
            Credentials saved to file: [/home/fedora/.config/gcloud/application_default_credentials.json]
            
            These credentials will be used by any library that requests Application Default Credentials (ADC).
            
            Quota project "openshift-qe" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource.
            $ ls ~/.gcp/osServiceAccount.json
            ls: cannot access '/home/fedora/.gcp/osServiceAccount.json': No such file or directory
            $ 
            $ echo $GOOGLE_APPLICATION_CREDENTIALS
            
            $ 

            (2) Scenario A: IPI XPN installation, using gcloud CLI default credentials, and with controlPlane.platform.gcp.serviceAccount settings

            $ openshift-install version
            openshift-install 4.14.0-0.nightly-2023-08-26-232738
            built from commit 41e3c2e9eacce4b97761ba64ec19e7a0fff0daa3
            release image registry.ci.openshift.org/ocp/release@sha256:789735d63c63e8efc3c144e09093859e2c8210e3a651e7d7ca882ff93840d407
            release architecture amd64
            $ 
            $ yq-3.3.0 r test2/install-config.yaml platform
            gcp:
              projectID: openshift-qe
              region: us-central1
              network: installer-shared-vpc
              controlPlaneSubnet: installer-shared-vpc-subnet-1
              computeSubnet: installer-shared-vpc-subnet-2
              networkProjectID: openshift-qe-shared-vpc
            $ yq-3.3.0 r test2/install-config.yaml credentialsMode
            Passthrough
            $ yq-3.3.0 r test2/install-config.yaml baseDomain
            qe.gcp.devcluster.openshift.com
            $ yq-3.3.0 r test2/install-config.yaml metadata
            creationTimestamp: null
            name: jiwei-0827a
            $ yq-3.3.0 r test2/install-config.yaml compute
            - architecture: amd64
              hyperthreading: Enabled
              name: worker
              platform:
                gcp:
                  tags:
                  - preserved-ipi-xpn-compute
              replicas: 2
            $ yq-3.3.0 r test2/install-config.yaml controlPlane
            architecture: amd64
            hyperthreading: Enabled
            name: master
            platform:
              gcp:
                serviceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com
                tags:
                - preserved-ipi-xpn-control-plane
            replicas: 3
            $ openshift-install create cluster --dir test2
            INFO Credentials loaded from gcloud CLI defaults  
            INFO Consuming Install Config from target directory 
            INFO Creating infrastructure resources...         
            INFO Waiting up to 20m0s (until 6:14PM CST) for the Kubernetes API at https://api.jiwei-0827a.qe.gcp.devcluster.openshift.com:6443... 
            INFO API v1.27.4+d424288 up                       
            INFO Waiting up to 30m0s (until 6:26PM CST) for bootstrapping to complete... 
            INFO Destroying the bootstrap resources...        
            INFO Waiting up to 40m0s (until 6:50PM CST) for the cluster at https://api.jiwei-0827a.qe.gcp.devcluster.openshift.com:6443 to initialize... 
            ...output omitted...
            ERROR Cluster initialization failed because one or more operators are not functioning properly.
            ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
            ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
            ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation
            ERROR failed to initialize the cluster: Cluster operators authentication, console, control-plane-machine-set, image-registry, ingress, machine-api, monitoring are not available
            $ oc get co | grep -v 'True        False         False'
            NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
            authentication                             4.14.0-0.nightly-2023-08-26-232738   False       False         True       56m     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found...
            cloud-credential                           4.14.0-0.nightly-2023-08-26-232738   True        True          True       61m     4 of 7 credentials requests are failing to sync.
            cluster-autoscaler                                                              True        False         True       55m     machine-api not ready
            console                                    4.14.0-0.nightly-2023-08-26-232738   False       False         True       48m     RouteHealthAvailable: console route is not admitted
            control-plane-machine-set                  4.14.0-0.nightly-2023-08-26-232738   False       False         True       56m     Missing 3 available replica(s)
            image-registry                                                                  False       True          True       48m     Available: The deployment does not exist...
            ingress                                                                         False       True          True       49m     The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.)
            machine-api                                                                     False       True          True       55m     Operator is initializing
            monitoring                                                                      False       True          True       45m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-operator-admission-webhook: context deadline exceeded
            network                                    4.14.0-0.nightly-2023-08-26-232738   True        True          False      58m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is waiting for other operators to become ready...
            $ oc logs cloud-credential-operator-848ff77cc6-tqp5d -n openshift-cloud-credential-operator -c cloud-credential-operator | grep level=error
            ...output omitted...
            ime="2023-08-27T10:58:02Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-image-registry-gcs secret=openshift-image-registry/installer-cloud-credentials
            time="2023-08-27T10:58:11Z" level=error msg="error syncing credentials: cloud root creds do not have enough permissions to be used as-is" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials
            time="2023-08-27T10:58:11Z" level=error msg="errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials
            $ 
            $ gcloud iam service-accounts list --format="table(email,displayName,projectId)" --filter="displayName~jiwei-0827a"
            EMAIL                                                     DISPLAY NAME                   PROJECT_ID
            jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com  jiwei-0827a-6hmhn-worker-node  openshift-qe
            $ gcloud projects get-iam-policy openshift-qe --flatten='bindings[].members' --format='table(bindings.role)' --filter='bindings.members:jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com'
            ROLE
            roles/compute.viewer
            roles/storage.admin
            $ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten='bindings[].members' --format='table(bindings.role)' --filter='bindings.members:jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com'

            Jianli Wei added a comment - FYI the must-gather is available at https://drive.google.com/file/d/1NRbqGuqLBciWd2uwjTzFf3Lo1IjaGp-V/view?usp=drive_link . (1) the gcp auth settings $ gcloud auth login  Go to the following link in your browser:     <login url> Enter authorization code: <auth code> You are now logged in as [jiwei@redhat.com]. Your current project is [openshift-qe].  You can change this setting by running:   $ gcloud config set project PROJECT_ID $ gcloud auth application- default login  Go to the following link in your browser:     <login url> Enter authorization code: <auth code> Credentials saved to file: [/home/fedora/.config/gcloud/application_default_credentials.json] These credentials will be used by any library that requests Application Default Credentials (ADC). Quota project "openshift-qe" was added to ADC which can be used by Google client libraries for billing and quota. Note that some services may still bill the project owning the resource. $ ls ~/.gcp/osServiceAccount.json ls: cannot access '/home/fedora/.gcp/osServiceAccount.json' : No such file or directory $  $ echo $GOOGLE_APPLICATION_CREDENTIALS $  (2) Scenario A: IPI XPN installation, using gcloud CLI default credentials, and with controlPlane.platform.gcp.serviceAccount settings $ openshift-install version openshift-install 4.14.0-0.nightly-2023-08-26-232738 built from commit 41e3c2e9eacce4b97761ba64ec19e7a0fff0daa3 release image registry.ci.openshift.org/ocp/release@sha256:789735d63c63e8efc3c144e09093859e2c8210e3a651e7d7ca882ff93840d407 release architecture amd64 $  $ yq-3.3.0 r test2/install-config.yaml platform gcp:   projectID: openshift-qe   region: us-central1   network: installer-shared-vpc   controlPlaneSubnet: installer-shared-vpc-subnet-1   computeSubnet: installer-shared-vpc-subnet-2   networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r test2/install-config.yaml credentialsMode Passthrough $ yq-3.3.0 r test2/install-config.yaml baseDomain qe.gcp.devcluster.openshift.com $ yq-3.3.0 r test2/install-config.yaml metadata creationTimestamp: null name: jiwei-0827a $ yq-3.3.0 r test2/install-config.yaml compute - architecture: amd64   hyperthreading: Enabled   name: worker   platform:     gcp:       tags:       - preserved-ipi-xpn-compute   replicas: 2 $ yq-3.3.0 r test2/install-config.yaml controlPlane architecture: amd64 hyperthreading: Enabled name: master platform:   gcp:     serviceAccount: ipi-xpn-minpt-permissions@openshift-qe.iam.gserviceaccount.com     tags:     - preserved-ipi-xpn-control-plane replicas: 3 $ openshift-install create cluster --dir test2 INFO Credentials loaded from gcloud CLI defaults   INFO Consuming Install Config from target directory  INFO Creating infrastructure resources...          INFO Waiting up to 20m0s (until 6:14PM CST) for the Kubernetes API at https: //api.jiwei-0827a.qe.gcp.devcluster.openshift.com:6443...  INFO API v1.27.4+d424288 up                        INFO Waiting up to 30m0s (until 6:26PM CST) for bootstrapping to complete...  INFO Destroying the bootstrap resources...         INFO Waiting up to 40m0s (until 6:50PM CST) for the cluster at https: //api.jiwei-0827a.qe.gcp.devcluster.openshift.com:6443 to initialize...  ...output omitted... ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https: //docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait- for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Cluster operators authentication, console, control-plane-machine-set, image-registry, ingress, machine-api, monitoring are not available $ oc get co | grep -v 'True        False         False' NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE authentication                             4.14.0-0.nightly-2023-08-26-232738   False       False         True       56m     OAuthServerRouteEndpointAccessibleControllerAvailable: failed to retrieve route from cache: route.route.openshift.io "oauth-openshift" not found... cloud-credential                           4.14.0-0.nightly-2023-08-26-232738   True        True          True       61m     4 of 7 credentials requests are failing to sync. cluster-autoscaler                                                              True        False         True       55m     machine-api not ready console                                    4.14.0-0.nightly-2023-08-26-232738   False       False         True       48m     RouteHealthAvailable: console route is not admitted control-plane-machine-set                  4.14.0-0.nightly-2023-08-26-232738   False       False         True       56m     Missing 3 available replica(s) image-registry                                                                  False       True          True       48m     Available: The deployment does not exist... ingress                                                                         False       True          True       49m     The " default " ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.) machine-api                                                                     False       True          True       55m     Operator is initializing monitoring                                                                      False       True          True       45m     reconciling Prometheus Operator Admission Webhook Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus- operator -admission-webhook: context deadline exceeded network                                    4.14.0-0.nightly-2023-08-26-232738   True        True          False      58m     Deployment "/openshift-cloud-network-config-controller/cloud-network-config-controller" is waiting for other operators to become ready... $ oc logs cloud-credential- operator -848ff77cc6-tqp5d -n openshift-cloud-credential- operator -c cloud-credential- operator | grep level=error ...output omitted... ime= "2023-08-27T10:58:02Z" level=error msg= "errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential- operator /openshift-image-registry-gcs secret=openshift-image-registry/installer-cloud-credentials time= "2023-08-27T10:58:11Z" level=error msg= "error syncing credentials: cloud root creds do not have enough permissions to be used as-is" controller=credreq cr=openshift-cloud-credential- operator /openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials time= "2023-08-27T10:58:11Z" level=error msg= "errored with condition: CredentialsProvisionFailure" controller=credreq cr=openshift-cloud-credential- operator /openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials $  $ gcloud iam service-accounts list --format= "table(email,displayName,projectId)" --filter= "displayName~jiwei-0827a" EMAIL                                                     DISPLAY NAME                   PROJECT_ID jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com  jiwei-0827a-6hmhn-worker-node  openshift-qe $ gcloud projects get-iam-policy openshift-qe --flatten= 'bindings[].members' --format= 'table(bindings.role)' --filter= 'bindings.members:jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com' ROLE roles/compute.viewer roles/storage.admin $ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten= 'bindings[].members' --format= 'table(bindings.role)' --filter= 'bindings.members:jiwei-0827a-6hmhn-w@openshift-qe.iam.gserviceaccount.com' $ 

            rhn-support-jiwei yes if you are able to provide the must gather that should help as well

            Patrick Dillon added a comment - rhn-support-jiwei yes if you are able to provide the must gather that should help as well

            Jianli Wei added a comment -

            can you confirm that the service account being used in the cluster includes both of these permissions?

            padillon  If you mean the service account for worker nodes, it seems no, see below please.

             

            $ gcloud compute instances list --filter='name~jiwei'
            NAME                        ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP  STATUS
            jiwei-0818a-6797h-master-0  us-central1-a  n2-standard-4               10.0.0.53                 RUNNING
            jiwei-0818a-6797h-master-1  us-central1-b  n2-standard-4               10.0.0.46                 RUNNING
            jiwei-0818a-6797h-master-2  us-central1-c  n2-standard-4               10.0.0.40                 RUNNING
            $ gcloud iam service-accounts list --filter='displayName~jiwei-0818'
            DISPLAY NAME                   EMAIL                                                     DISABLED
            jiwei-0818a-6797h-worker-node  jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com  False
            $ ./list_roles.sh jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com
            
            Running Command: gcloud projects get-iam-policy openshift-qe --flatten='bindings[].members' --format='table(bindings.role)' --filter='bindings.members:jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com'
            
            ROLE
            roles/compute.viewer
            roles/storage.admin
            
            Running Command: gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten='bindings[].members' --format='table(bindings.role)' --filter='bindings.members:jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com'
            
            
            $ gcloud iam service-accounts describe jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com
            description: Created By OpenShift Installer
            displayName: jiwei-0818a-6797h-worker-node
            email: jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com
            etag: MDEwMjE5MjA=
            name: projects/openshift-qe/serviceAccounts/jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com
            oauth2ClientId: '116266542425598076560'
            projectId: openshift-qe
            uniqueId: '116266542425598076560'

             

            You may also want to inspect the root credentials to confirm which service account is being used.

            Not sure what you mean exactly by "the root credentials", anyway, the CLI default credential in use has "roles/owner" in both the service project and the host project, and the service-account for control-plane machines has the required permissions in both the service project and the host project (according to OCP doc). 

            If you are able to reproduce this error can you obtain further error messages about why the cloud credential operator is not able to generate the requested credentials? 

            Do you think must-gather logs would have such info? Thanks! 

            Jianli Wei added a comment - can you confirm that the service account being used in the cluster includes both of these permissions? padillon   If you mean the service account for worker nodes, it seems no, see below please.   $ gcloud compute instances list --filter= 'name~jiwei' NAME                        ZONE           MACHINE_TYPE   PREEMPTIBLE  INTERNAL_IP  EXTERNAL_IP  STATUS jiwei-0818a-6797h-master-0  us-central1-a  n2-standard-4               10.0.0.53                 RUNNING jiwei-0818a-6797h-master-1  us-central1-b  n2-standard-4               10.0.0.46                 RUNNING jiwei-0818a-6797h-master-2  us-central1-c  n2-standard-4               10.0.0.40                 RUNNING $ gcloud iam service-accounts list --filter= 'displayName~jiwei-0818' DISPLAY NAME                   EMAIL                                                     DISABLED jiwei-0818a-6797h-worker-node  jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com  False $ ./list_roles.sh jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com Running Command: gcloud projects get-iam-policy openshift-qe --flatten= 'bindings[].members' --format= 'table(bindings.role)' --filter= 'bindings.members:jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com' ROLE roles/compute.viewer roles/storage.admin Running Command: gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten= 'bindings[].members' --format= 'table(bindings.role)' --filter= 'bindings.members:jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com' $ gcloud iam service-accounts describe jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com description: Created By OpenShift Installer displayName: jiwei-0818a-6797h-worker-node email: jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com etag: MDEwMjE5MjA= name: projects/openshift-qe/serviceAccounts/jiwei-0818a-6797h-w@openshift-qe.iam.gserviceaccount.com oauth2ClientId: '116266542425598076560' projectId: openshift-qe uniqueId: '116266542425598076560' $    You may also want to inspect the root credentials to confirm which service account is being used. Not sure what you mean exactly by "the root credentials", anyway, the CLI default credential in use has "roles/owner" in both the service project and the host project, and the service-account for control-plane machines has the required permissions in both the service project and the host project (according to OCP doc ).  If you are able to reproduce this error can you obtain further error messages about why the cloud credential operator is not able to generate the requested credentials?  Do you think must-gather logs would have such info? Thanks! 

            Patrick Dillon added a comment - - edited

            rhn-support-jiwei Thanks for your help here.

            > rh-ee-bbarbach padillon I don't think the above statement is true, for example, see here, where the IPI installation does succeed with environmental auth and credentialsMode being the default Mint mode. 

            You are correct. I misidentified the problem. It seems like this should indeed work with Passthrough mode and we do not need to add any further credential restrictions. I misunderstood the original error message in your scenario A:
              Message:               Failed to check if machine exists: jiwei-0703a-jht6l-worker-a-psjb7: failed to create scope for machine: error getting credentials secret "gcp-cloud-credentials" in namespace "openshift-machine-api": Secret "gcp-cloud-credentials" not found
            I believe we need more information to diagnose this error. It appears that the cloud-credential operator is not able to fulfill the cred request of the machine-api operator. Looking at the credential requests for GCP, I see that the machine API operator expects:

                predefinedRoles:
                - roles/compute.admin
                - roles/iam.serviceAccountUser

             

            rhn-support-jiwei can you confirm that the service account being used in the cluster includes both of these permissions? If you are able to reproduce this error can you obtain further error messages about why the cloud credential operator is not able to generate the requested credentials? You may also want to inspect the root credentials to confirm which service account is being used.

            Thank you!

            Patrick Dillon added a comment - - edited rhn-support-jiwei Thanks for your help here. > rh-ee-bbarbach   padillon  I don't think the above statement is true, for example, see here, where the IPI installation does succeed with environmental auth and credentialsMode being the default Mint mode.  You are correct. I misidentified the problem. It seems like this should indeed work with Passthrough mode and we do not need to add any further credential restrictions. I misunderstood the original error message in your scenario A:   Message:               Failed to check if machine exists: jiwei-0703a-jht6l-worker-a-psjb7: failed to create scope for machine: error getting credentials secret "gcp-cloud-credentials" in namespace "openshift-machine-api": Secret "gcp-cloud-credentials" not found I believe we need more information to diagnose this error. It appears that the cloud-credential operator is not able to fulfill the cred request of the machine-api operator. Looking at the credential requests for GCP, I see that the machine API operator expects:     predefinedRoles:     - roles/compute.admin     - roles/iam.serviceAccountUser   rhn-support-jiwei can you confirm that the service account being used in the cluster includes both of these permissions? If you are able to reproduce this error can you obtain further error messages about why the cloud credential operator is not able to generate the requested credentials? You may also want to inspect the root credentials to confirm which service account is being used. Thank you!

            Jianli Wei added a comment -

            Is it possible that your system had GOOGLE_APPLICATION_CREDENTIALS set as an environment variable in your examples above?

            rh-ee-bbarbach I don't think so (see below). 

            [fedora@preserve-jiwei ~]$ echo $GOOGLE_APPLICATION_CREDENTIALS
            
            [fedora@preserve-jiwei ~]$  

            Besides, when starting the installation, there does be INFO logs telling CLI defaults creds is used, so? 

            $ openshift-install create cluster --dir test1
            INFO Credentials loaded from gcloud CLI defaults
             

            Jianli Wei added a comment - Is it possible that your system had GOOGLE_APPLICATION_CREDENTIALS set as an environment variable in your examples above? rh-ee-bbarbach I don't think so (see below).  [fedora@preserve-jiwei ~]$ echo $GOOGLE_APPLICATION_CREDENTIALS [fedora@preserve-jiwei ~]$  Besides, when starting the installation, there does be INFO logs telling CLI defaults creds is used, so?  $ openshift-install create cluster --dir test1 INFO Credentials loaded from gcloud CLI defaults

              rh-ee-bbarbach Brent Barbachem
              padillon Patrick Dillon
              Jianli Wei Jianli Wei
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: