Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2358

[gcp][CORS-1774] with "platform.gcp.privateDNSZone" specified, the ingress operator failed to configure "*.apps.<cluster-name>..." dns record

    XMLWordPrintable

Details

    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      with "platform.gcp.privateDNSZone" specified, the ingress operator failed to configure "*.apps.<cluster-name>..." dns record, so that "wait-for install-complete" failed

      Version-Release number of selected component (if applicable):

      $ openshift-install version
      openshift-install 4.12.0-0.nightly-2022-10-05-053337
      built from commit 84aa8222b622dee71185a45f1e0ba038232b114a
      release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e
      release architecture amd64
      

      How reproducible:

      Always

      Steps to Reproduce:

      1. Specify valid "platform.gcp.privateDNSZone" settings, then try IPI installation.
      

      Actual results:

      "wait-for install-complete" failed
      
      2022-10-14T09:45:22.738Z        ERROR   operator.dns_controller dns/controller.go:359   failed to publish DNS record to zone    {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
      

      Expected results:

      Installation should succeed. 

      Additional info:

      $ yq-3.3.0 r work07/install-config.yaml platform
      gcp:
        projectID: openshift-qe
        region: us-central1
        computeSubnet: installer-shared-vpc-subnet-2
        controlPlaneSubnet: installer-shared-vpc-subnet-1
        createFirewallRules: Disabled
        privateDNSZone:
          id: ci-op-xpn-private-zone
          project: openshift-qe-shared-vpc
        network: installer-shared-vpc
        networkProjectID: openshift-qe-shared-vpc
      $ yq-3.3.0 r work07/install-config.yaml baseDomain
      qe.gcp.devcluster.openshift.com
      $ yq-3.3.0 r work07/install-config.yaml metadata
      creationTimestamp: null
      name: jiwei-1014-07
      $ 
      $ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=ci-op-xpn-private-zone'
      NAME                    DNS_NAME                          DESCRIPTION                        VISIBILITY
      ci-op-xpn-private-zone  qe.gcp.devcluster.openshift.com.  Preserved private zone for CI XPN  private
      $ gcloud dns managed-zones list --filter='name=qe'
      NAME  DNS_NAME                          DESCRIPTION                  VISIBILITY
      qe    qe.gcp.devcluster.openshift.com.  Base Domain for QE clusters  public
      $ 
      $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
      Listed 0 items.
      $ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
      Listed 0 items.
      $ 
      $ openshift-install create cluster --dir work07
      INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
      INFO Consuming Install Config from target directory
      INFO Creating infrastructure resources...
      INFO Waiting up to 20m0s (until 8:54AM) for the Kubernetes API at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443...
      INFO API v1.25.0+3ef6ef3 up
      INFO Waiting up to 30m0s (until 9:05AM) for bootstrapping to complete...
      INFO Destroying the bootstrap resources...
      INFO Waiting up to 40m0s (until 9:35AM) for the cluster at https://api.jiwei-1014-07.qe.gcp.devcluster.openshift.com:6443 to initialize...
      ERROR Cluster operator authentication Degraded is True with OAuthServerRouteEndpointAccessibleController_SyncError: OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
      ERROR Cluster operator authentication Available is False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable: OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
      INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform
      INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerAvailable is True with AsExpected: Trusted CA Bundle Controller works as expected
      INFO Cluster operator cloud-controller-manager TrustedCABundleControllerControllerDegraded is False with AsExpected: Trusted CA Bundle Controller works as expected
      INFO Cluster operator cloud-controller-manager CloudConfigControllerAvailable is True with AsExpected: Cloud Config Controller works as expected
      INFO Cluster operator cloud-controller-manager CloudConfigControllerDegraded is False with AsExpected: Cloud Config Controller works as expected
      INFO Cluster operator console Progressing is True with SyncLoopRefresh_InProgress: SyncLoopRefreshProgressing: Working toward version 4.12.0-0.nightly-2022-10-05-053337, 0 replicas available
      ERROR Cluster operator console Available is False with Deployment_InsufficientReplicas::RouteHealth_FailedGet: DeploymentAvailable: 0 replicas available for console deployment
      ERROR RouteHealthAvailable: failed to GET route (https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com): Get "https://console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com": dial tcp: lookup console-openshift-console.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host
      INFO Cluster operator etcd RecentBackup is Unknown with ControllerStarted: The etcd backup controller is starting, and will decide if recent backups are available or if a backup is required
      ERROR Cluster operator ingress Degraded is True with IngressDegraded: The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
      INFO Cluster operator insights ClusterTransferAvailable is False with NoClusterTransfer: no available cluster transfer
      INFO Cluster operator insights Disabled is False with AsExpected:
      INFO Cluster operator insights SCAAvailable is True with Updated: SCA certs successfully updated in the etc-pki-entitlement secret
      INFO Cluster operator network ManagementStateDegraded is False with :
      ERROR Cluster initialization failed because one or more operators are not functioning properly.
      ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below,
      ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html
      ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation
      ERROR failed to initialize the cluster: Cluster operators authentication, console are not available
      $ 
      $ 
      $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone ci-op-xpn-private-zone --filter='name~jiwei-1014-07'
      NAME                                                    TYPE  TTL  DATA
      api.jiwei-1014-07.qe.gcp.devcluster.openshift.com.      A     60   10.0.0.26
      api-int.jiwei-1014-07.qe.gcp.devcluster.openshift.com.  A     60   10.0.0.26
      $ gcloud dns record-sets list --zone qe --filter='name~jiwei-1014-07'
      NAME                                                TYPE  TTL  DATA
      api.jiwei-1014-07.qe.gcp.devcluster.openshift.com.  A     60   35.226.39.140
      $ 
      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          62m     Unable to apply 4.12.0-0.nightly-2022-10-05-053337: some cluster operators are not available
      $ oc get nodes
      NAME                                                         STATUS   ROLES                  AGE   VERSION
      jiwei-1014-07-mxsmt-master-0.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
      jiwei-1014-07-mxsmt-master-1.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
      jiwei-1014-07-mxsmt-master-2.c.openshift-qe.internal         Ready    control-plane,master   61m   v1.25.0+3ef6ef3
      jiwei-1014-07-mxsmt-worker-a-csmlt.c.openshift-qe.internal   Ready    worker                 43m   v1.25.0+3ef6ef3
      jiwei-1014-07-mxsmt-worker-b-mn7ww.c.openshift-qe.internal   Ready    worker                 43m   v1.25.0+3ef6ef3
      $ oc get co | grep -v 'True        False         False'
      NAME                                       VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.12.0-0.nightly-2022-10-05-053337   False       False         True       58m     OAuthServerRouteEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com on 172.30.0.10:53: no such host (this is likely result of malfunctioning DNS server)
      console                                    4.12.0-0.nightly-2022-10-05-053337   False       True          False      41m     DeploymentAvailable: 0 replicas available for console deployment...
      ingress                                    4.12.0-0.nightly-2022-10-05-053337   True        False         True       42m     The "default" ingress controller reports Degraded=True: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)
      $ 
      $ oc get pods -n openshift-ingress-operator
      NAME                                READY   STATUS    RESTARTS      AGE
      ingress-operator-5588d4c6f7-q6n8g   2/2     Running   3 (49m ago)   62m
      $ oc logs ingress-operator-5588d4c6f7-q6n8g -n openshift-ingress-operator -c ingress-operator
      ......
      2022-10-14T09:45:20.441Z        ERROR   operator.ingress_controller     controller/controller.go:121    got retryable error; requeueing{"after": "1m0s", "error": "IngressController is degraded: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing)"}
      2022-10-14T09:45:22.675Z        INFO    operator.dns_controller controller/controller.go:121    reconciling     {"request": "openshift-ingress-operator/default-wildcard"}
      2022-10-14T09:45:22.738Z        ERROR   operator.dns_controller dns/controller.go:359   failed to publish DNS record to zone    {"record": {"dnsName":"*.apps.jiwei-1014-07.qe.gcp.devcluster.openshift.com.","targets":["34.70.11.151"],"recordType":"A","recordTTL":30,"dnsManagementPolicy":"Managed"}, "dnszone": {"id":"project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone"}, "error": "googleapi: Error 404: The 'parameters.managedZone' resource named 'project/openshift-qe-shared-vpc/managedZones/ci-op-xpn-private-zone' does not exist., notFound"}
      ......
      $ 
      
      $ gcloud config get account
      ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com
      $ gcloud config get project
      openshift-qe
      $ 
      
      $ gcloud config get account
      jiwei@redhat.com
      $ gcloud config get project
      openshift-qe
      $ gcloud projects get-iam-policy openshift-qe --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
      ROLE
      roles/compute.admin
      roles/compute.instanceAdmin.v1
      roles/compute.loadBalancerAdmin
      roles/compute.storageAdmin
      roles/dns.admin
      roles/iam.roleViewer
      roles/iam.securityAdmin
      roles/iam.securityReviewer
      roles/iam.serviceAccountAdmin
      roles/iam.serviceAccountKeyAdmin
      roles/iam.serviceAccountUser
      roles/storage.admin
      $ gcloud projects get-iam-policy openshift-qe-shared-vpc --flatten="bindings[].members" --format="table(bindings.role)" --filter="bindings.members:ipi-xpn-no-fw-permissions@openshift-qe.iam.gserviceaccount.com"
      ROLE
      roles/compute.networkUser
      roles/dns.admin
      $ 
      

       

       

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rhn-support-jiwei Jianli Wei
              Hongan Li Hongan Li
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: