-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.12.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
with "platform.gcp.privateDNSZone" specified, the ingress operator failed to configure "*.apps.<cluster-name>..." dns record, so that "wait-for install-complete" failed
Version-Release number of selected component (if applicable):
$ openshift-install version openshift-install 4.12.0-0.nightly-2022-10-05-053337 built from commit 84aa8222b622dee71185a45f1e0ba038232b114a release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e release architecture amd64
How reproducible:
Always
Steps to Reproduce:
1. Specify valid "platform.gcp.privateDNSZone" settings, then try IPI installation.
Actual results:
"wait-for install-complete" failed
2022-10-14T09:45:22.738Z ERROR operator.dns_controller dns/controller.go:359 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
Expected results:
Installation should succeed.
Additional info:
$ yq-3.3.0 r work07/install-config.yaml platform
gcp:
projectID: openshift-qe
region: us-central1
computeSubnet: installer-shared-vpc-subnet-2
controlPlaneSubnet: installer-shared-vpc-subnet-1
createFirewallRules: Disabled
privateDNSZone:
id: ci-op-xpn-private-zone
project: openshift-qe-shared-vpc
network: installer-shared-vpc
networkProjectID: openshift-qe-shared-vpc
$ yq-3.3.0 r work07/install-config.yaml baseDomain
qe.gcp.devcluster.openshift.com
$ yq-3.3.0 r work07/install-config.yaml metadata
creationTimestamp: null
name: jiwei-1014-07
$
$ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=ci-op-xpn-private-zone'
NAME DNS_NAME DESCRIPTION VISIBILITY
ci-op-xpn-private-zone qe.gcp.devcluster.openshift.com. Preserved private zone for CI XPN private
$ gcloud dns managed-zones list --filter='name=qe'
NAME DNS_NAME DESCRIPTION VISIBILITY
qe qe.gcp.devcluster.openshift.com. Base Domain for QE clusters public
$
$ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
Listed 0 items.
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
Listed 0 items.
$
$ openshift-install create cluster --dir work07
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
INFO Consuming Install Config from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 8:54AM) for the Kubernetes API at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443...
INFO API v1.25.0+3ef6ef3 up
INFO Waiting up to 30m0s (until 9:05AM) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s (until 9:35AM) for the cluster at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443 to initialize...
ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-05-053337, 0 replicas available
ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment
ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer
INFO Cluster operator insights Disabled is False with AsExpected:
INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret
INFO Cluster operator network ManagementStateDegraded is False with :
ERROR Cluster initialization failed because one or more operators are not functioning properly.
ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation
ERROR failed to initialize the cluster: Cluster operators authentication, console are not available
$
$
$ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
NAME TYPE TTL DATA
api.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 10.0.0.26
api-int.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 10.0.0.26
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
NAME TYPE TTL DATA
api.jiwei-1014-07.qe.gcp.devcluster.openshift.com. A 60 35.226.39.140
$
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 62m Unable to apply 4.12.0-0.nightly-2022-10-05-053337: some cluster operators are not available
$ oc get nodes
NAME STATUS ROLES AGE VERSION
jiwei-1014-07-mxsmt-master-0.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-master-1.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-master-2.c.openshift-qe.internal Ready control-plane,master 61m v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-worker-a-csmlt.c.openshift-qe.internal Ready worker 43m v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-worker-b-mn7ww.c.openshift-qe.internal Ready worker 43m v1.25.0+3ef6ef3
$ oc get co | grep -v 'True False False'
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
authentication 4.12.0-0.nightly-2022-10-05-053337 False False True 58m OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console 4.12.0-0.nightly-2022-10-05-053337 False True False 41m DeploymentAvailable: 0 replicas available for console deployment...
ingress 4.12.0-0.nightly-2022-10-05-053337 True False True 42m The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
$
$ oc get pods -n openshift-ingress-operator
NAME READY STATUS RESTARTS AGE
ingress-operator-5588d4c6f7-q6n8g 2/2 Running 3 (49m ago) 62m
$ oc logs ingress-operator-5588d4c6f7-q6n8g -n openshift-ingress-operator -c ingress-operator
......
2022-10-14T09:45:20.441Z ERROR operator.ingress_controller controller/controller.go:121 got retryable error; requeueing{"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
2022-10-14T09:45:22.675Z INFO operator.dns_controller controller/controller.go:121 reconciling {"request": "openshift-ingress-operator/default-wildcard"}
2022-10-14T09:45:22.738Z ERROR operator.dns_controller dns/controller.go:359 failed to publish DNS record to zone {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
......
$
$ gcloud config get account
ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com
$ gcloud config get project
openshift-qe
$
$ gcloud config get account
jiwei@redhat.com
$ gcloud config get project
openshift-qe
$ gcloud projects get-iam-policy openshift-qe --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
ROLE
roles/compute.admin
roles/compute.instanceAdmin.v1
roles/compute.loadBalancerAdmin
roles/compute.storageAdmin
roles/dns.admin
roles/iam.roleViewer
roles/iam.securityAdmin
roles/iam.securityReviewer
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountKeyAdmin
roles/iam.serviceAccountUser
roles/storage.admin
$ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
ROLE
roles/compute.networkUser
roles/dns.admin
$
- is related to
-
CORS-2030 QE Tracker
-
- Closed
-