-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
BU Product Work
-
5
-
False
-
None
-
False
-
OCPSTRAT-232 - Enable OpenShift IPI Installer to deploy OCP to a shared VPC in GCP - GA
-
Sprint 230
-
Proposed
User Story:
I want to install a private cluster through GCP XPN IPI so that no public endpoints are exposed with my shared vpc.
=================================================
QE tested this in 4.12 and his this issue:
Description of problem:
ingress operator complains worker node should be in control plane subnet unexpectedly, so that 'wait-for install-complete' failed
Version-Release number of selected component (if applicable):
$ openshift-install version openshift-install 4.12.0-0.nightly-2022-10-15-094115 built from commit c5d7528d759ea808dbd3291101ec40fd222e1273 release image registry.ci.openshift.org/ocp/release@sha256:55d8660794fbf33031e83e3b3489dc3718290ccbba38ab056d02ffe4a25274c4 release architecture amd64
How reproducible:
Always
Steps to Reproduce:
1. try IPI XPN installation with "publish" being "Internal", i.e. to deploy a private cluster
Actual results:
$ oc get co ingress NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE ingress False True True 12s The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1017-00-4cbln-worker-b-ddsqz' is expected to be in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' but is in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2'., wrongSubnetwork... $
Expected results:
There should be no such error and the installation should succeed.
Additional info:
1. the google cloud credential does have enough permissions $ gcloud config get account jiwei@redhat.com $ gcloud config get project openshift-qe $ gcloud projects get-iam-policy openshift-qe --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-min-permissions@openshift-qe.iam.gserviceaccount.com" ROLE roles/compute.admin roles/compute.instanceAdmin.v1 roles/compute.loadBalancerAdmin roles/compute.storageAdmin roles/dns.admin roles/iam.roleViewer roles/iam.securityAdmin roles/iam.securityReviewer roles/iam.serviceAccountAdmin roles/iam.serviceAccountKeyAdmin roles/iam.serviceAccountUser roles/storage.admin $ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-min-permissions@openshift-qe.iam.gserviceaccount.com" ROLE projects/openshift-qe-shared-vpc/roles/dns.networks.bindPrivateDNSZone roles/compute.networkUser $ 2. the install-config snipppets$ gcloud config get account ipi-xpn-min-permissions@openshift-qe.iam.gserviceaccount.com $ gcloud config get project openshift-qe $ $ mkdir work11 $ cp install-config.yaml work11 $ yq-3.3.0 r work11/install-config.yaml platform gcp: projectID: openshift-qe region: us-central1 computeSubnet: installer-shared-vpc-subnet-2 controlPlaneSubnet: installer-shared-vpc-subnet-1 createFirewallRules: Disabled network: installer-shared-vpc networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r work11/install-config.yaml publish Internal $ yq-3.3.0 r work11/install-config.yaml featureSet TechPreviewNoUpgrade $ yq-3.3.0 r work11/install-config.yaml compute - architecture: amd64 hyperthreading: Enabled name: worker platform: gcp: tags: - preserved-ipi-xpn-compute replicas: 2 $ yq-3.3.0 r work11/install-config.yaml controlPlane architecture: amd64 hyperthreading: Enabled name: master platform: gcp: tags: - preserved-ipi-xpn-control-plane replicas: 3 $ $ export http_proxy=http://<username>:<password>@<bastion public ip>:3128 $ export https_proxy=http://<username>:<password>@<bastion public ip>:3128 $ 3. try "create cluster" with the above install-config$ openshift-install create cluster --dir work11 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 11:31AM) for the Kubernetes API at https://api.jiwei-1017-00.qe.gcp.devcluster.openshift.com:6443... INFO API v1.25.2+5bf2e1f up INFO Waiting up to 30m0s (until 11:44AM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 12:15PM) for the cluster at https://api.jiwei-1017-00.qe.gcp.devcluster.openshift.com:6443 to initialize... ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudControllerOwner is True with AsExpected: Cluster Cloud Controller Manager Operator owns cloud controllers at 4.12.0-0.nightly-2022-10-15-094115 INFO Cluster operator cluster-api SecretSyncControllerAvailable is True with AsExpected: User Data Secret Controller works as expected INFO Cluster operator cluster-api SecretSyncControllerDegraded is False with AsExpected: User Data Secret Controller works as expected INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-15-094115, 0 replicas available ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required ERROR Cluster operator ingress Available is False with IngressUnavailable: The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1017-00-4cbln-worker-b-ddsqz' is expected to be in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' but is in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2'., wrongSubnetwork ERROR The kube-controller-manager logs may contain more details.) INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) INFO Cluster operator ingress EvaluationConditionsDetected is False with AsExpected: INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Cluster operators authentication, console, ingress are not available $ $ export KUBECONFIG=work11/auth/kubeconfig $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 63m Unable to apply 4.12.0-0.nightly-2022-10-15-094115: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-1017-00-4cbln-master-0.c.openshift-qe.internal Ready control-plane,master 62m v1.25.2+5bf2e1f jiwei-1017-00-4cbln-master-1.c.openshift-qe.internal Ready control-plane,master 63m v1.25.2+5bf2e1f jiwei-1017-00-4cbln-master-2.c.openshift-qe.internal Ready control-plane,master 63m v1.25.2+5bf2e1f jiwei-1017-00-4cbln-worker-a-skr9w.c.openshift-qe.internal Ready worker 42m v1.25.2+5bf2e1f jiwei-1017-00-4cbln-worker-b-ddsqz.c.openshift-qe.internal Ready worker 42m v1.25.2+5bf2e1f $ oc get co | grep -v "True False False" NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-10-15-094115 False False True 58m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1017-00.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) console 4.12.0-0.nightly-2022-10-15-094115 False True False 43m DeploymentAvailable: 0 replicas available for console deployment... ingress False True True 13s The "default" ingress controller reports Available=False: IngressControllerUnavailable: One or more status conditions indicate unavailable: LoadBalancerReady=False (SyncLoadBalancerFailed: The service-controller component is reporting SyncLoadBalancerFailed events like: Error syncing load balancer: failed to ensure load balancer: googleapi: Error 400: Resource 'projects/openshift-qe/zones/us-central1-b/instances/jiwei-1017-00-4cbln-worker-b-ddsqz' is expected to be in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1' but is in the subnetwork 'projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2'., wrongSubnetwork... $ 4. some related GCP resources$ gcloud compute instances list --filter='name~jiwei-1017-00' NAME ZONE MACHINE_TYPE PREEMPTIBLE INTERNAL_IP EXTERNAL_IP STATUS jiwei-1017-00-4cbln-master-0 us-central1-a n2-standard-4 10.0.0.54 RUNNING jiwei-1017-00-4cbln-worker-a-skr9w us-central1-a n2-standard-4 10.0.32.56 RUNNING jiwei-1017-00-rhel8-bastion us-central1-a n1-standard-1 10.0.0.40 35.223.25.38 RUNNING jiwei-1017-00-4cbln-master-1 us-central1-b n2-standard-4 10.0.0.55 RUNNING jiwei-1017-00-4cbln-worker-b-ddsqz us-central1-b n2-standard-4 10.0.32.57 RUNNING jiwei-1017-00-4cbln-master-2 us-central1-c n2-standard-4 10.0.0.53 RUNNING $ gcloud compute instances describe jiwei-1017-00-4cbln-master-0 --zone us-central1-a --format json | jq -r .networkInterfaces[0].subnetwork https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1 $ gcloud compute instances describe jiwei-1017-00-4cbln-master-1 --zone us-central1-b --format json | jq -r .networkInterfaces[0].subnetwork https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1 $ gcloud compute instances describe jiwei-1017-00-4cbln-master-2 --zone us-central1-c --format json | jq -r .networkInterfaces[0].subnetwork https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1 $ gcloud compute instances describe jiwei-1017-00-4cbln-worker-a-skr9w --zone us-central1-a --format json | jq -r .networkInterfaces[0].subnetwork https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2 $ gcloud compute instances describe jiwei-1017-00-4cbln-worker-b-ddsqz --zone us-central1-b --format json | jq -r .networkInterfaces[0].subnetwork https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2 $ $ gcloud --project openshift-qe-shared-vpc compute networks subnets describe installer-shared-vpc-subnet-1 --region us-central1 creationTimestamp: '2022-04-11T06:01:14.454-07:00' fingerprint: _iSHt-Vys1Y= gatewayAddress: 10.0.0.1 id: '6501392071863670901' ipCidrRange: 10.0.0.0/19 kind: compute#subnetwork name: installer-shared-vpc-subnet-1 network: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/global/networks/installer-shared-vpc privateIpGoogleAccess: false privateIpv6GoogleAccess: DISABLE_GOOGLE_ACCESS purpose: PRIVATE region: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1 selfLink: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1 stackType: IPV4_ONLY $ $ gcloud --project openshift-qe-shared-vpc compute networks subnets describe installer-shared-vpc-subnet-2 --region us-central1 creationTimestamp: '2022-04-11T06:01:13.334-07:00' fingerprint: yC6-bWh3OGc= gatewayAddress: 10.0.32.1 id: '3231194790049192054' ipCidrRange: 10.0.32.0/19 kind: compute#subnetwork name: installer-shared-vpc-subnet-2 network: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/global/networks/installer-shared-vpc privateIpGoogleAccess: false privateIpv6GoogleAccess: DISABLE_GOOGLE_ACCESS purpose: PRIVATE region: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1 selfLink: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2 stackType: IPV4_ONLY $ $ gcloud --project openshift-qe-shared-vpc compute networks describe installer-shared-vpc autoCreateSubnetworks: false creationTimestamp: '2022-04-11T06:00:54.339-07:00' id: '3925591109463021673' kind: compute#network name: installer-shared-vpc networkFirewallPolicyEnforcementOrder: AFTER_CLASSIC_FIREWALL routingConfig: routingMode: REGIONAL selfLink: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/global/networks/installer-shared-vpc selfLinkWithId: https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/global/networks/3925591109463021673 subnetworks: - https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-2 - https://www.googleapis.com/compute/v1/projects/openshift-qe-shared-vpc/regions/us-central1/subnetworks/installer-shared-vpc-subnet-1 x_gcloud_bgp_routing_mode: REGIONAL x_gcloud_subnet_mode: CUSTOM $ 5. the must-gather logs: http://virt-openshift-05.lab.eng.nay.redhat.com/jiwei/jiwei-1017-00-4cbln/must-gather.local.2217384992678477281.tar
- is related to
-
OCPBUGS-5755 GCP XPN private cluster install attempts to add masters to k8s-ig-xxxx instance groups
- Closed
- links to
- mentioned on