Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2351

[gcp][CORS-1774] with "platform.gcp.publicDNSZone" specified, the installer has issue creating dns record-sets for api_external

    • None
    • 2
    • Sprint 226
    • 1
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      with "platform.gcp.publicDNSZone" specified, the installer has issue creating dns record-sets for api_external

      Version-Release number of selected component (if applicable):

      $ openshift-install version
      openshift-install 4.12.0-0.nightly-2022-10-05-053337
      built from commit 84aa8222b622dee71185a45f1e0ba038232b114a
      release image registry.ci.openshift.org/ocp/release@sha256:41fe173061b00caebb16e2fd11bac19980d569cd933fdb4fab8351cdda14d58e
      release architecture amd64
      

      How reproducible:

      Always

      Steps to Reproduce:

      Scenario A: Use baseDomain of the service project, try IPI installation.
      Scenario B: Use baseDomain of the host project, try IPI installation. 

      Actual results:

      Both scenarios failed.
      
      Scenario A: 
      ERROR Error: Error creating DNS RecordSet: googleapi: Error 400: Invalid value for 'entity.change.additions[0].name': 'api.jiwei-1014-04.qe.gcp.devcluster.openshift.com.', invalid
      
      Scenario B: 
      FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": baseDomain: Internal error: no matching public DNS Zone found

      Expected results:

      Either Scenario A or Scenario B should succeed. 

      Additional info:

      $ gcloud dns managed-zones list --filter='name=qe'
      NAME  DNS_NAME                          DESCRIPTION                  VISIBILITY
      qe    qe.gcp.devcluster.openshift.com.  Base Domain for QE clusters  public
      $ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=qe-shared-vpc'
      NAME           DNS_NAME                                        DESCRIPTION  VISIBILITY
      qe-shared-vpc  qe-shared-vpc.qe.gcp.devcluster.openshift.com.               public
      $ 
      
      $ yq-3.3.0 r work04/install-config.yaml platform
      gcp:
        projectID: openshift-qe  
        region: us-central1
        computeSubnet: installer-shared-vpc-subnet-2
        controlPlaneSubnet: installer-shared-vpc-subnet-1
        createFirewallRules: Disabled
        publicDNSZone:
          id: qe-shared-vpc
          project: openshift-qe-shared-vpc
        network: installer-shared-vpc
        networkProjectID: openshift-qe-shared-vpc
      $ yq-3.3.0 r work04/install-config.yaml baseDomain
      qe.gcp.devcluster.openshift.com
      $ 
      $ openshift-install create cluster --dir work04
      INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
      INFO Consuming Install Config from target directory
      INFO Creating infrastructure resources...
      ERROR
      ERROR Error: Error creating DNS RecordSet: googleapi: Error 400: Invalid value for 'entity.change.additions[0].name': 'api.jiwei-1014-04.qe.gcp.devcluster.openshift.com.', invalid
      ERROR
      ERROR   with module.dns.google_dns_record_set.api_external[0],
      ERROR   on dns/base.tf line 22, in resource "google_dns_record_set" "api_external":
      ERROR   22: resource "google_dns_record_set" "api_external" {
      ERROR
      ERROR failed to fetch Cluster: failed to generate asset "Cluster": failure applying terraform for "cluster" stage: failed to create cluster: failed to apply Terraform: exit status 1
      ERROR
      ERROR Error: Error creating DNS RecordSet: googleapi: Error 400: Invalid value for 'entity.change.additions[0].name': 'api.jiwei-1014-04.qe.gcp.devcluster.openshift.com.', invalid
      ERROR
      ERROR   with module.dns.google_dns_record_set.api_external[0],
      ERROR   on dns/base.tf line 22, in resource "google_dns_record_set" "api_external":
      ERROR   22: resource "google_dns_record_set" "api_external" {
      ERROR
      ERROR
      $ 
      $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone qe-shared-vpc --filter='name~jiwei-1014-04'
      Listed 0 items.
      $ 
      
      $ yq-3.3.0 r work05/install-config.yaml platform
      gcp:
        projectID: openshift-qe
        region: us-central1
        computeSubnet: installer-shared-vpc-subnet-2
        controlPlaneSubnet: installer-shared-vpc-subnet-1
        createFirewallRules: Disabled
        publicDNSZone:
          id: qe-shared-vpc
          project: openshift-qe-shared-vpc
        network: installer-shared-vpc
        networkProjectID: openshift-qe-shared-vpc
      $ yq-3.3.0 r work05/install-config.yaml baseDomain
      qe-shared-vpc.qe.gcp.devcluster.openshift.com
      $ 
      $ yq-3.3.0 r work05/install-config.yaml platform
      gcp:
        projectID: openshift-qe
        region: us-central1
        computeSubnet: installer-shared-vpc-subnet-2
        controlPlaneSubnet: installer-shared-vpc-subnet-1
        createFirewallRules: Disabled
        publicDNSZone:
          id: qe-shared-vpc
          project: openshift-qe-shared-vpc
        network: installer-shared-vpc
        networkProjectID: openshift-qe-shared-vpc
      $ yq-3.3.0 r work05/install-config.yaml baseDomain
      qe-shared-vpc.qe.gcp.devcluster.openshift.com
      $ openshift-install create cluster --dir work05
      INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
      INFO Consuming Install Config from target directory 
      FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": baseDomain: Internal error: no matching public DNS Zone found 
      $ 

       

       

       

            [OCPBUGS-2351] [gcp][CORS-1774] with "platform.gcp.publicDNSZone" specified, the installer has issue creating dns record-sets for api_external

            rhn-support-jiwei we have outstanding work in CORS-2072 to enable the creation of *.apps. I'm closing this as not a bug because it looks like the original bug has been resolved.

            Patrick Dillon added a comment - rhn-support-jiwei we have outstanding work in CORS-2072 to enable the creation of *.apps. I'm closing this as not a bug because it looks like the original bug has been resolved.

            Jianli Wei added a comment - - edited

            Tested with 4.12.0-0.nightly-2022-10-25-210451, although Scenario B installation succeeded, the "*.apps.<cluster name>.<base domain>" DNS record-set is not added to the public DNS zone in the host project. Please investigate, thanks!

            $ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=qe-shared-vpc'
            NAME           DNS_NAME                                        DESCRIPTION  VISIBILITY
            qe-shared-vpc  qe-shared-vpc.qe.gcp.devcluster.openshift.com.               public
            $ gcloud dns managed-zones list --filter='dns_name=qe.gcp.devcluster.openshift.com.'
            NAME  DNS_NAME                          DESCRIPTION                  VISIBILITY
            qe    qe.gcp.devcluster.openshift.com.  Base Domain for QE clusters  public

            Scenario A: Use baseDomain of the service project, try IPI installation.

            $ yq-3.3.0 r test1/install-config.yaml platform
            gcp:
              projectID: openshift-qe
              region: us-central1
              computeSubnet: installer-shared-vpc-subnet-2
              controlPlaneSubnet: installer-shared-vpc-subnet-1
              createFirewallRules: Disabled
              publicDNSZone:
                id: qe-shared-vpc
                project: openshift-qe-shared-vpc
              network: installer-shared-vpc
              networkProjectID: openshift-qe-shared-vpc
            $ yq-3.3.0 r test1/install-config.yaml baseDomain
            qe.gcp.devcluster.openshift.com
            $ yq-3.3.0 r test1/install-config.yaml credentialsMode
            Passthrough
            $
            $ openshift-install create cluster --dir test1
            INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
            ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: platform.gcp.publicDNSZone.id: Invalid value: "qe-shared-vpc": publicDNSZone does not exist in baseDomain qe.gcp.devcluster.openshift.com 

            Scenario B: Use baseDomain of the host project, try IPI installation. 

            $ yq-3.3.0 r test2/install-config.yaml platform
            gcp:
              projectID: openshift-qe
              region: us-central1
              computeSubnet: installer-shared-vpc-subnet-2
              controlPlaneSubnet: installer-shared-vpc-subnet-1
              createFirewallRules: Disabled
              publicDNSZone:
                id: qe-shared-vpc
                project: openshift-qe-shared-vpc
              network: installer-shared-vpc
              networkProjectID: openshift-qe-shared-vpc
            $ yq-3.3.0 r test2/install-config.yaml baseDomain
            qe-shared-vpc.qe.gcp.devcluster.openshift.com
            $ yq-3.3.0 r test2/install-config.yaml credentialsMode
            Passthrough
            $ yq-3.3.0 r test2/install-config.yaml publish
            External

            $ openshift-install create cluster --dir test2
            INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"
            INFO Consuming Install Config from target directory
            WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster.
            INFO Creating infrastructure resources...
            INFO Waiting up to 20m0s (until 2:38AM) for the Kubernetes API at https://api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com:6443...
            INFO API v1.25.2+4bd0702 up
            INFO Waiting up to 30m0s (until 2:53AM) for bootstrapping to complete...
            INFO Destroying the bootstrap resources...
            INFO Waiting up to 40m0s (until 3:25AM) for the cluster at https://api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com:6443 to initialize...
            INFO Checking to see if there is a route at openshift-console/console...
            INFO Install complete!
            INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/fedora/test2/auth/kubeconfig'
            INFO Access the OpenShift web-console here: https://console-openshift-console.apps.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com
            INFO Login to the console with user: "kubeadmin", and password: "ue6US-K2fFe-cfBtj-cx5YQ"
            INFO Time elapsed: 42m22s
            $ export KUBECONFIG=/home/fedora/test2/auth/kubeconfig
            $ oc get clusterversion
            NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
            version   4.12.0-0.nightly-2022-10-25-210451   True        False         81s     Cluster version is 4.12.0-0.nightly-2022-10-25-210451
            $ oc get nodes
            NAME                                                       STATUS   ROLES                  AGE   VERSION
            jiwei-1027a-sxsjb-master-0.c.openshift-qe.internal         Ready    control-plane,master   46m   v1.25.2+4bd0702
            jiwei-1027a-sxsjb-master-1.c.openshift-qe.internal         Ready    control-plane,master   45m   v1.25.2+4bd0702
            jiwei-1027a-sxsjb-master-2.c.openshift-qe.internal         Ready    control-plane,master   45m   v1.25.2+4bd0702
            jiwei-1027a-sxsjb-worker-a-rqd6m.c.openshift-qe.internal   Ready    worker                 25m   v1.25.2+4bd0702
            jiwei-1027a-sxsjb-worker-b-v8gbf.c.openshift-qe.internal   Ready    worker                 25m   v1.25.2+4bd0702

            $ gcloud dns record-sets list --zone jiwei-1027a-sxsjb-private-zone --format="table(type,name,rrdatas)" --filter="type=A OR type=SRV"
            TYPE  NAME                                                                RRDATAS
            A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.      ['10.0.0.6']
            A     api-int.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.  ['10.0.0.6']
            A     *.apps.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['34.68.21.56']
            $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone qe-shared-vpc --format="table(type,name,rrdatas)" --filter="name~jiwei-1027"
            TYPE  NAME                                                            RRDATAS
            A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.  ['35.224.175.244']

            $ gcloud -project openshift-qe-shared-vpc dns record-sets list --zone qe-shared-vpc -format="table(type,name,rrdatas)" --filter="type=A"
            TYPE  NAME                                                                RRDATAS
            A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.      ['35.224.175.244']
            A     api.jmekkatt-ins.qe-shared-vpc.qe.gcp.devcluster.openshift.com.     ['34.172.95.231']
            A     *.apps.jmekkatt-ins.qe-shared-vpc.qe.gcp.devcluster.openshift.com.  ['34.170.218.171']
            A     api.newugd-20723.qe-shared-vpc.qe.gcp.devcluster.openshift.com.     ['34.66.196.163']
            A     *.apps.newugd-20723.qe-shared-vpc.qe.gcp.devcluster.openshift.com.  ['35.194.0.22']

             

            Jianli Wei added a comment - - edited Tested with 4.12.0-0.nightly-2022-10-25-210451, although Scenario B installation succeeded, the " *.apps.<cluster name>.<base domain> " DNS record-set is not added to the public DNS zone in the host project. Please investigate, thanks! $ gcloud --project openshift-qe-shared-vpc dns managed-zones list --filter='name=qe-shared-vpc' NAME           DNS_NAME                                        DESCRIPTION  VISIBILITY qe-shared-vpc  qe-shared-vpc.qe.gcp.devcluster.openshift.com.               public $ gcloud dns managed-zones list --filter='dns_name=qe.gcp.devcluster.openshift.com.' NAME  DNS_NAME                          DESCRIPTION                  VISIBILITY qe    qe.gcp.devcluster.openshift.com.  Base Domain for QE clusters  public $  Scenario A: Use baseDomain of the service project, try IPI installation. $ yq-3.3.0 r test1/install-config.yaml platform gcp:   projectID: openshift-qe   region: us-central1   computeSubnet: installer-shared-vpc-subnet-2   controlPlaneSubnet: installer-shared-vpc-subnet-1   createFirewallRules: Disabled   publicDNSZone:     id: qe-shared-vpc     project: openshift-qe-shared-vpc   network: installer-shared-vpc   networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r test1/install-config.yaml baseDomain qe.gcp.devcluster.openshift.com $ yq-3.3.0 r test1/install-config.yaml credentialsMode Passthrough $ $ openshift-install create cluster --dir test1 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json"  ERROR failed to fetch Metadata: failed to load asset "Install Config": failed to create install config: platform.gcp.publicDNSZone.id: Invalid value: "qe-shared-vpc": publicDNSZone does not exist in baseDomain qe.gcp.devcluster.openshift.com  $  Scenario B: Use baseDomain of the host project, try IPI installation.   $ yq-3.3.0 r test2/install-config.yaml platform gcp:   projectID: openshift-qe   region: us-central1   computeSubnet: installer-shared-vpc-subnet-2   controlPlaneSubnet: installer-shared-vpc-subnet-1   createFirewallRules: Disabled   publicDNSZone:     id: qe-shared-vpc     project: openshift-qe-shared-vpc   network: installer-shared-vpc   networkProjectID: openshift-qe-shared-vpc $ yq-3.3.0 r test2/install-config.yaml baseDomain qe-shared-vpc.qe.gcp.devcluster.openshift.com $ yq-3.3.0 r test2/install-config.yaml credentialsMode Passthrough $ yq-3.3.0 r test2/install-config.yaml publish External $  $ openshift-install create cluster --dir test2 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory WARNING FeatureSet "TechPreviewNoUpgrade" is enabled. This FeatureSet does not allow upgrades and may affect the supportability of the cluster. INFO Creating infrastructure resources... INFO Waiting up to 20m0s (until 2:38AM) for the Kubernetes API at https://api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com:6443 ... INFO API v1.25.2+4bd0702 up INFO Waiting up to 30m0s (until 2:53AM) for bootstrapping to complete... INFO Destroying the bootstrap resources... INFO Waiting up to 40m0s (until 3:25AM) for the cluster at https://api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com:6443 to initialize... INFO Checking to see if there is a route at openshift-console/console... INFO Install complete! INFO To access the cluster as the system:admin user when using 'oc', run 'export KUBECONFIG=/home/fedora/test2/auth/kubeconfig' INFO Access the OpenShift web-console here: https://console-openshift-console.apps.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com INFO Login to the console with user: "kubeadmin", and password: "ue6US-K2fFe-cfBtj-cx5YQ" INFO Time elapsed: 42m22s $ export KUBECONFIG=/home/fedora/test2/auth/kubeconfig $ oc get clusterversion NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS version   4.12.0-0.nightly-2022-10-25-210451   True        False         81s     Cluster version is 4.12.0-0.nightly-2022-10-25-210451 $ oc get nodes NAME                                                       STATUS   ROLES                  AGE   VERSION jiwei-1027a-sxsjb-master-0.c.openshift-qe.internal         Ready    control-plane,master   46m   v1.25.2+4bd0702 jiwei-1027a-sxsjb-master-1.c.openshift-qe.internal         Ready    control-plane,master   45m   v1.25.2+4bd0702 jiwei-1027a-sxsjb-master-2.c.openshift-qe.internal         Ready    control-plane,master   45m   v1.25.2+4bd0702 jiwei-1027a-sxsjb-worker-a-rqd6m.c.openshift-qe.internal   Ready    worker                 25m   v1.25.2+4bd0702 jiwei-1027a-sxsjb-worker-b-v8gbf.c.openshift-qe.internal   Ready    worker                 25m   v1.25.2+4bd0702 $  $ gcloud dns record-sets list --zone jiwei-1027a-sxsjb-private-zone --format="table(type,name,rrdatas)" --filter="type=A OR type=SRV" TYPE  NAME                                                                RRDATAS A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.       ['10.0.0.6'] A     api-int.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['10.0.0.6'] A     *.apps.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['34.68.21.56'] $ gcloud --project openshift-qe-shared-vpc dns record-sets list --zone qe-shared-vpc --format="table(type,name,rrdatas)" --filter="name~jiwei-1027" TYPE  NAME                                                            RRDATAS A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['35.224.175.244'] $  $ gcloud - project openshift-qe-shared-vpc dns record-sets list --zone qe-shared-vpc -format="table(type,name,rrdatas)" --filter="type=A" TYPE  NAME                                                                RRDATAS A     api.jiwei-1027a.qe-shared-vpc.qe.gcp.devcluster.openshift.com.       ['35.224.175.244'] A     api.jmekkatt-ins.qe-shared-vpc.qe.gcp.devcluster.openshift.com.     ['34.172.95.231'] A     *.apps.jmekkatt-ins.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['34.170.218.171'] A     api.newugd-20723.qe-shared-vpc.qe.gcp.devcluster.openshift.com.     ['34.66.196.163'] A     *.apps.newugd-20723.qe-shared-vpc.qe.gcp.devcluster.openshift.com.   ['35.194.0.22'] $   

            Moving this back to ON_QA. Brent tested and WORKSFORME. Looks like we need to use a later build.

            Patrick Dillon added a comment - Moving this back to ON_QA. Brent tested and WORKSFORME. Looks like we need to use a later build.

            rhn-support-jiwei can you retest with something later than a build on 10/6/2022. This PR (https://github.com/openshift/installer/pull/6300) merged on 10/5/2022 and the base domain validation is different.

             

            Brent Barbachem added a comment - rhn-support-jiwei can you retest with something later than a build on 10/6/2022. This PR ( https://github.com/openshift/installer/pull/6300) merged on 10/5/2022 and the base domain validation is different.  

            For scenario B: we should ensure the validation function is using the correct project when ensuring the dns zone exists.

             

            Need to investigate further the meaning behind the error in scenario A.

            Patrick Dillon added a comment - For scenario B: we should ensure the validation function is using the correct project when ensuring the dns zone exists.   Need to investigate further the meaning behind the error in scenario A.

              rh-ee-bbarbach Brent Barbachem
              rhn-support-jiwei Jianli Wei
              Jianli Wei Jianli Wei
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: