Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29929

[gcp] destroying the problem cluster unexpectedly deletes the dns record-sets not created by the installer


    • Moderate
    • No
    • 2
    • Sprint 253
    • 1
    • Rejected
    • False
    • Hide



      This is a clone of issue OCPBUGS-27156. The following is the description of the original issue:

      Description of problem:

         Trying to create the second cluster using the same cluster name and base domain as the first cluster would fail, as expected, because of the dns record-sets conflicts. But deleting the second cluster leads to the first cluster inaccessible, which is unexpected. 

      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:

      1. create the first cluster and make sure it succeeds
      2. try to create the second cluster, with the same cluster name, base domain, and region, and make sure it failed
      3. destroy the second cluster which failed due to "Platform Provisioning Check"
      4. check if the first cluster is still healthy     

      Actual results:

          The first cluster turns unhealthy, because the dns record-sets are deleted by step3

      Expected results:

          The dns record-sets of the first cluster stay untouched during step3, and the the first cluster stays healthy after step3.

      Additional info:

      (1) the first cluster is by Flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/257549/, and it's healthy initially
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.15.0-0.nightly-2024-01-14-100410   True        False         54m     Cluster version is 4.15.0-0.nightly-2024-01-14-100410
      $ oc get nodes
      NAME                                                       STATUS   ROLES                  AGE   VERSION
      jiwei-0115y-lgns8-master-0.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
      jiwei-0115y-lgns8-master-1.c.openshift-qe.internal         Ready    control-plane,master   73m   v1.28.5+c84a6b8
      jiwei-0115y-lgns8-master-2.c.openshift-qe.internal         Ready    control-plane,master   74m   v1.28.5+c84a6b8
      jiwei-0115y-lgns8-worker-a-gqq96.c.openshift-qe.internal   Ready    worker                 62m   v1.28.5+c84a6b8
      jiwei-0115y-lgns8-worker-b-2h9xd.c.openshift-qe.internal   Ready    worker                 63m   v1.28.5+c84a6b8
      (2) try to create the second cluster and expect failing due to dns record already exists
      $ openshift-install version
      openshift-install 4.15.0-0.nightly-2024-01-14-100410
      built from commit b6f320ab7eeb491b2ef333a16643c140239de0e5
      release image registry.ci.openshift.org/ocp/release@sha256:385d84c803c776b44ce77b80f132c1b6ed10bd590f868c97e3e63993b811cc2d
      release architecture amd64
      $ mkdir test1
      $ cp install-config.yaml test1
      $ yq-3.3.0 r test1/install-config.yaml baseDomain
      $ yq-3.3.0 r test1/install-config.yaml metadata
      creationTimestamp: null
      name: jiwei-0115y
      $ yq-3.3.0 r test1/install-config.yaml platform
        projectID: openshift-qe
        region: us-central1
      $ openshift-install create cluster --dir test1
      INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
      INFO Consuming Install Config from target directory 
      FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": metadata.name: Invalid value: "jiwei-0115y": record(s) ["api.jiwei-0115y.qe.gcp.devcluster.openshift.com."] already exists in DNS Zone (openshift-qe/qe) and might be in use by another cluster, please remove it to continue 
      (3) delete the second cluster
      $ openshift-install destroy cluster --dir test1
      INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" 
      INFO Deleted 2 recordset(s) in zone qe            
      INFO Deleted 3 recordset(s) in zone jiwei-0115y-lgns8-private-zone 
      WARNING Skipping deletion of DNS Zone jiwei-0115y-lgns8-private-zone, not created by installer 
      INFO Time elapsed: 37s                            
      INFO Uninstallation complete!                     
      (4) check the first cluster status and the dns record-sets
      $ oc get clusterversion
      Unable to connect to the server: dial tcp: lookup api.jiwei-0115y.qe.gcp.devcluster.openshift.com on no such host
      $ gcloud dns managed-zones describe jiwei-0115y-lgns8-private-zone
        kind: dns#managedZoneCloudLoggingConfig
      creationTime: '2024-01-15T07:22:55.199Z'
      description: Created By OpenShift Installer
      dnsName: jiwei-0115y.qe.gcp.devcluster.openshift.com.
      id: '9193862213315831261'
      kind: dns#managedZone
        kubernetes-io-cluster-jiwei-0115y-lgns8: owned
      name: jiwei-0115y-lgns8-private-zone
      - ns-gcp-private.googledomains.com.
        kind: dns#managedZonePrivateVisibilityConfig
        - kind: dns#managedZonePrivateVisibilityConfigNetwork
          networkUrl: https://www.googleapis.com/compute/v1/projects/openshift-qe/global/networks/jiwei-0115y-lgns8-network
      visibility: private
      $ gcloud dns record-sets list --zone jiwei-0115y-lgns8-private-zone
      NAME                                          TYPE  TTL    DATA
      jiwei-0115y.qe.gcp.devcluster.openshift.com.  NS    21600  ns-gcp-private.googledomains.com.
      jiwei-0115y.qe.gcp.devcluster.openshift.com.  SOA   21600  ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
      $ gcloud dns record-sets list --zone qe --filter='name~jiwei-0115y'
      Listed 0 items.

            rh-ee-bbarbach Brent Barbachem
            openshift-crt-jira-prow OpenShift Prow Bot
            Jianli Wei Jianli Wei
            0 Vote for this issue
            6 Start watching this issue