Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Installer / openshift-installer
Labels:
- TestBlocker

Regression:
None
Release Blocker:
Proposed
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Target Version:

4.12.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

with "platform.gcp.privateDNSZone" specified, the ingress operator failed to configure "*.apps.<cluster-name>..." dns record, so that "wait-for install-complete" failed

Version-Release number of selected component (if applicable):

$ openshift-install version
openshift-install 4.12.0-0.nightly-2022-10-05-053337
built from commit 84aa8222b622dee71185a45f1e0ba038232b114a
release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e
release architecture amd64

How reproducible:

Always

Steps to Reproduce:

1. Specify valid "platform.gcp.privateDNSZone" settings, then try IPI installation.

Actual results:

"wait-for install-complete" failed

2022-10-14T09:45:22.738Z        ERROR   operator.dns_controller dns/controller.go:359   failed to publish DNS record to zone    {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}

Expected results:

Installation should succeed.

Additional info:

$ yq-3.3.0 r work07/install-config.yaml platform
gcp:
  projectID: openshift-qe
  region: us-central1
  computeSubnet: installer-shared-vpc-subnet-2
  controlPlaneSubnet: installer-shared-vpc-subnet-1
  createFirewallRules: Disabled
  privateDNSZone:
    id: ci-op-xpn-private-zone
    project: openshift-qe-shared-vpc
  network: installer-shared-vpc
  networkProjectID: openshift-qe-shared-vpc
$ yq-3.3.0 r work07/install-config.yaml baseDomain
qe.gcp.devcluster.openshift.com
$ yq-3.3.0 r work07/install-config.yaml metadata
creationTimestamp: null
name: jiwei-1014-07
$ 
$ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=ci-op-xpn-private-zone'
NAME                    DNS_NAME                          DESCRIPTION                        VISIBILITY
ci-op-xpn-private-zone  qe.gcp.devcluster.openshift.com.  Preserved private zone for CI XPN  private
$ gcloud dns managed-zones list --filter='name=qe'
NAME  DNS_NAME                          DESCRIPTION                  VISIBILITY
qe    qe.gcp.devcluster.openshift.com.  Base Domain for QE clusters  public
$ 
$ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
Listed 0 items.
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
Listed 0 items.
$ 
$ openshift-install create cluster --dir work07
INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
INFO Consuming Install Config from target directory
INFO Creating infrastructure resources...
INFO Waiting up to 20m0s (until 8:54AM) for the Kubernetes API at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443...
INFO API v1.25.0+3ef6ef3 up
INFO Waiting up to 30m0s (until 9:05AM) for bootstrapping to complete...
INFO Destroying the bootstrap resources...
INFO Waiting up to 40m0s (until 9:35AM) for the cluster at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443 to initialize...
ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-05-053337, 0 replicas available
ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment
ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host
INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer
INFO Cluster operator insights Disabled is False with AsExpected:
INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret
INFO Cluster operator network ManagementStateDegraded is False with :
ERROR Cluster initialization failed because one or more operators are not functioning properly.
ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation
ERROR failed to initialize the cluster: Cluster operators authentication, console are not available
$ 
$ 
$ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
NAME                                                    TYPE  TTL  DATA
api.jiwei-1014-07.qe.gcp.devcluster.openshift.com.      A     60   10.0.0.26
api-int.jiwei-1014-07.qe.gcp.devcluster.openshift.com.  A     60   10.0.0.26
$ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
NAME                                                TYPE  TTL  DATA
api.jiwei-1014-07.qe.gcp.devcluster.openshift.com.  A     60   35.226.39.140
$ 
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version             False       True          62m     Unable to apply 4.12.0-0.nightly-2022-10-05-053337: some cluster operators are not available
$ oc get nodes
NAME                                                         STATUS   ROLES                  AGE   VERSION
jiwei-1014-07-mxsmt-master-0.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-master-1.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-master-2.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-worker-a-csmlt.c.openshift-qe.internal   Ready    worker                 43m   v1.25.0+3ef6ef3
jiwei-1014-07-mxsmt-worker-b-mn7ww.c.openshift-qe.internal   Ready    worker                 43m   v1.25.0+3ef6ef3
$ oc get co | grep -v 'True        False         False'
NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.12.0-0.nightly-2022-10-05-053337   False       False         True       58m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
console                                    4.12.0-0.nightly-2022-10-05-053337   False       True          False      41m     DeploymentAvailable: 0 replicas available for console deployment...
ingress                                    4.12.0-0.nightly-2022-10-05-053337   True        False         True       42m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
$ 
$ oc get pods -n openshift-ingress-operator
NAME                                READY   STATUS    RESTARTS      AGE
ingress-operator-5588d4c6f7-q6n8g   2/2     Running   3 (49m ago)   62m
$ oc logs ingress-operator-5588d4c6f7-q6n8g -n openshift-ingress-operator -c ingress-operator
......
2022-10-14T09:45:20.441Z        ERROR   operator.ingress_controller     controller/controller.go:121    got retryable error; requeueing{"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
2022-10-14T09:45:22.675Z        INFO    operator.dns_controller controller/controller.go:121    reconciling     {"request": "openshift-ingress-operator/default-wildcard"}
2022-10-14T09:45:22.738Z        ERROR   operator.dns_controller dns/controller.go:359   failed to publish DNS record to zone    {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
......
$ 

$ gcloud config get account
ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com
$ gcloud config get project
openshift-qe
$ 

$ gcloud config get account
jiwei@redhat.com
$ gcloud config get project
openshift-qe
$ gcloud projects get-iam-policy openshift-qe --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
ROLE
roles/compute.admin
roles/compute.instanceAdmin.v1
roles/compute.loadBalancerAdmin
roles/compute.storageAdmin
roles/dns.admin
roles/iam.roleViewer
roles/iam.securityAdmin
roles/iam.securityReviewer
roles/iam.serviceAccountAdmin
roles/iam.serviceAccountKeyAdmin
roles/iam.serviceAccountUser
roles/storage.admin
$ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
ROLE
roles/compute.networkUser
roles/dns.admin
$

is related to

CORS-2030 QE Tracker

Closed

Assignee:: Unassigned

Reporter:: Jianli Wei

QA Contact:: Hongan Li

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2022/10/14 10:28 AM

Updated:: 2022/10/20 1:45 AM

Resolved:: 2022/10/19 6:57 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates