Description of problem:
Trying to create the second cluster using the same cluster name and base domain as the first cluster would fail, as expected, because of the dns record-sets conflicts. But deleting the second cluster leads to the first cluster inaccessible, which is unexpected.
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2024-01-14-100410
How reproducible:
Always
Steps to Reproduce:
1. create the first cluster and make sure it succeeds 2. try to create the second cluster, with the same cluster name, base domain, and region, and make sure it failed 3. destroy the second cluster which failed due to "Platform Provisioning Check" 4. check if the first cluster is still healthy
Actual results:
The first cluster turns unhealthy, because the dns record-sets are deleted by step3
Expected results:
The dns record-sets of the first cluster stay untouched during step3, and the the first cluster stays healthy after step3.
Additional info:
(1) the first cluster is by Flexy-install job https://mastern-jenkins-csb-openshift-qe.apps.ocp-c1.prod.psi.redhat.com/job/ocp-common/job/Flexy-install/257549/, and it's healthy initially $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.0-0.nightly-2024-01-14-100410 True False 54m Cluster version is 4.15.0-0.nightly-2024-01-14-100410 $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-0115y-lgns8-master-0.c.openshift-qe.internal Ready control-plane,master 73m v1.28.5+c84a6b8 jiwei-0115y-lgns8-master-1.c.openshift-qe.internal Ready control-plane,master 73m v1.28.5+c84a6b8 jiwei-0115y-lgns8-master-2.c.openshift-qe.internal Ready control-plane,master 74m v1.28.5+c84a6b8 jiwei-0115y-lgns8-worker-a-gqq96.c.openshift-qe.internal Ready worker 62m v1.28.5+c84a6b8 jiwei-0115y-lgns8-worker-b-2h9xd.c.openshift-qe.internal Ready worker 63m v1.28.5+c84a6b8 $ (2) try to create the second cluster and expect failing due to dns record already exists $ openshift-install version openshift-install 4.15.0-0.nightly-2024-01-14-100410 built from commit b6f320ab7eeb491b2ef333a16643c140239de0e5 release image registry.ci.openshift.org/ocp/release@sha256:385d84c803c776b44ce77b80f132c1b6ed10bd590f868c97e3e63993b811cc2d release architecture amd64 $ mkdir test1 $ cp install-config.yaml test1 $ yq-3.3.0 r test1/install-config.yaml baseDomain qe.gcp.devcluster.openshift.com $ yq-3.3.0 r test1/install-config.yaml metadata creationTimestamp: null name: jiwei-0115y $ yq-3.3.0 r test1/install-config.yaml platform gcp: projectID: openshift-qe region: us-central1 $ openshift-install create cluster --dir test1 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Consuming Install Config from target directory FATAL failed to fetch Terraform Variables: failed to fetch dependency of "Terraform Variables": failed to generate asset "Platform Provisioning Check": metadata.name: Invalid value: "jiwei-0115y": record(s) ["api.jiwei-0115y.qe.gcp.devcluster.openshift.com."] already exists in DNS Zone (openshift-qe/qe) and might be in use by another cluster, please remove it to continue $ (3) delete the second cluster $ openshift-install destroy cluster --dir test1 INFO Credentials loaded from file "/home/fedora/.gcp/osServiceAccount.json" INFO Deleted 2 recordset(s) in zone qe INFO Deleted 3 recordset(s) in zone jiwei-0115y-lgns8-private-zone WARNING Skipping deletion of DNS Zone jiwei-0115y-lgns8-private-zone, not created by installer INFO Time elapsed: 37s INFO Uninstallation complete! $ (4) check the first cluster status and the dns record-sets $ oc get clusterversion Unable to connect to the server: dial tcp: lookup api.jiwei-0115y.qe.gcp.devcluster.openshift.com on 10.11.5.160:53: no such host $ $ gcloud dns managed-zones describe jiwei-0115y-lgns8-private-zone cloudLoggingConfig: kind: dns#managedZoneCloudLoggingConfig creationTime: '2024-01-15T07:22:55.199Z' description: Created By OpenShift Installer dnsName: jiwei-0115y.qe.gcp.devcluster.openshift.com. id: '9193862213315831261' kind: dns#managedZone labels: kubernetes-io-cluster-jiwei-0115y-lgns8: owned name: jiwei-0115y-lgns8-private-zone nameServers: - ns-gcp-private.googledomains.com. privateVisibilityConfig: kind: dns#managedZonePrivateVisibilityConfig networks: - kind: dns#managedZonePrivateVisibilityConfigNetwork networkUrl: https://www.googleapis.com/compute/v1/projects/openshift-qe/global/networks/jiwei-0115y-lgns8-network visibility: private $ gcloud dns record-sets list --zone jiwei-0115y-lgns8-private-zone NAME TYPE TTL DATA jiwei-0115y.qe.gcp.devcluster.openshift.com. NS 21600 ns-gcp-private.googledomains.com. jiwei-0115y.qe.gcp.devcluster.openshift.com. SOA 21600 ns-gcp-private.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300 $ gcloud dns record-sets list --zone qe --filter='name~jiwei-0115y' Listed 0 items. $
- blocks
-
OCPBUGS-29929 [gcp] destroying the problem cluster unexpectedly deletes the dns record-sets not created by the installer
- Closed
- is cloned by
-
OCPBUGS-29929 [gcp] destroying the problem cluster unexpectedly deletes the dns record-sets not created by the installer
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update