-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.12.0
-
None
-
Proposed
-
False
-
Description of problem:
with "platform.gcp.privateDNSZone" specified, the ingress operator failed to configure "*.apps.<cluster-name>..." dns record, so that "wait-for install-complete" failed
Version-Release number of selected component (if applicable):
$ openshift-install version openshift-install 4.12.0-0.nightly-2022-10-05-053337 built from commit 84aa8222b622dee71185a45f1e0ba038232b114a release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e release architecture amd64
How reproducible:
Always
Steps to Reproduce:
1. Specify valid "platform.gcp.privateDNSZone" settings, then try IPI installation.
Actual results:
"wait-for install-complete" failed 2022-10-14T09:45:22.738Z ERROR operator.dns_controller dns/controller.go:359 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
Expected results:
Installation should succeed.
Additional info:
$ yq-3.3.0 r work07/install-config.yaml platform gcp: projectID: openshift-qe region: us-central1 computeSubnet: installer-shared-vpc-subnet-2 controlPlaneSubnet: installer-shared-vpc-subnet-1 createFirewallRules: Disabled privateDNSZone: id: ci-op-xpn-private-zone project: openshift-qe-shared-vpc network: installer-shared-vpc networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r work07/install-config.yaml baseDomain qe.gcp.devcluster.openshift.com $ yq-3.3.0 r work07/install-config.yaml metadata creationTimestamp: null name: jiwei-1014-07 $ $ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=ci-op-xpn-private-zone' NAME DNS_NAME DESCRIPTION VISIBILITY ci-op-xpn-private-zone qe.gcp.devcluster.openshift.com. Preserved private zone for CI XPN private $ gcloud dns managed-zones list --filter='name=qe' NAME DNS_NAME DESCRIPTION VISIBILITY qe qe.gcp.devcluster.openshift.com. Base Domain for QE clusters public $ $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07' Listed 0 items. $ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07' Listed 0 items. $ $ openshift-install create cluster --dir work07 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 8:54AM) for the Kubernetes API at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443... INFO API v1.25.0+3ef6ef3 up INFO Waiting up to 30m0s (until 9:05AM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 9:35AM) for the cluster at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443 to initialize... ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-05-053337, 0 replicas available ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation ERROR failed to initialize the cluster: Cluster operators authentication, console are not available $ $ $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07' NAME TYPE TTL DATA api.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 10.0.0.26 api-int.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 10.0.0.26 $ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07' NAME TYPE TTL DATA api.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 35.226.39.140 $ $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 62m Unable to apply 4.12.0-0.nightly-2022-10-05-053337: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-1014-07-mxsmt-master-0.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3 jiwei-1014-07-mxsmt-master-1.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3 jiwei-1014-07-mxsmt-master-2.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3 jiwei-1014-07-mxsmt-worker-a-csmlt.c.openshift-qe.internal Ready worker 43m v1.25.0+3ef6ef3 jiwei-1014-07-mxsmt-worker-b-mn7ww.c.openshift-qe.internal Ready worker 43m v1.25.0+3ef6ef3 $ oc get co | grep -v 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.12.0-0.nightly-2022-10-05-053337 False False True 58m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server) console 4.12.0-0.nightly-2022-10-05-053337 False True False 41m DeploymentAvailable: 0 replicas available for console deployment... ingress 4.12.0-0.nightly-2022-10-05-053337 True False True 42m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) $ $ oc get pods -n openshift-ingress-operator NAME READY STATUS RESTARTS AGE ingress-operator-5588d4c6f7-q6n8g 2/2 Running 3 (49m ago) 62m $ oc logs ingress-operator-5588d4c6f7-q6n8g -n openshift-ingress-operator -c ingress-operator ...... 2022-10-14T09:45:20.441Z ERROR operator.ingress_controller controller/controller.go:121 got retryable error; requeueing{"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"} 2022-10-14T09:45:22.675Z INFO operator.dns_controller controller/controller.go:121 reconciling {"request": "openshift-ingress-operator/default-wildcard"} 2022-10-14T09:45:22.738Z ERROR operator.dns_controller dns/controller.go:359 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"} ...... $ $ gcloud config get account ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com $ gcloud config get project openshift-qe $ $ gcloud config get account jiwei@redhat.com $ gcloud config get project openshift-qe $ gcloud projects get-iam-policy openshift-qe --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com" ROLE roles/compute.admin roles/compute.instanceAdmin.v1 roles/compute.loadBalancerAdmin roles/compute.storageAdmin roles/dns.admin roles/iam.roleViewer roles/iam.securityAdmin roles/iam.securityReviewer roles/iam.serviceAccountAdmin roles/iam.serviceAccountKeyAdmin roles/iam.serviceAccountUser roles/storage.admin $ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com" ROLE roles/compute.networkUser roles/dns.admin $
- is related to
-
CORS-2030 QE Tracker
- Closed