-
Bug
-
Resolution: Done
-
Normal
-
None
-
False
-
-
False
-
ToDo
-
If Release Note Needed, Set a Value
-
Set a Value
-
0
-
0
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Documentation Task summary:
The TL;DR is that a generated Route can have incorrect hostname on restore if hostname had been modified.
This is a suggested known issue draft from tkaovila@redhat.com : Please modify it as you see fit.
A generated Route can have incorrect hostname on restore if hostname had been modified
A generated Route which have annotation "openshift.io/host.generated: 'true'" is assumed to have its .spec.host value populated by the cluster and unmodified by the user. If the user have modified the host value from a generated value, the host value can be lost on restore.
There is no mechanism at this time in OADP to dynamically set route host value based on cluster base domain name for a non generated route.
Likewise, for a generated route, the host value will be stripped by oadp-operator to be regenerated on restore cluster. Any modifications to .spec.host of a generated route will be lost on restore.
== Original post below ==
Description of problem:
CSI restore fails on a new cluster with Datamover enabled on both the clusters.
# oc logs csi-snapshot-controller-7b4dbd9b55-9xx4v -n openshift-cluster-storage-operator I0923 15:59:07.974528 1 main.go:125] Version: v4.12.0-202208231618.p0.g5a93140.assembly.stream-0-gb48c7b0-dirty I0923 15:59:07.984341 1 main.go:174] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s] E0923 16:00:08.139182 1 main.go:86] Failed to list v1 volumesnapshots with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots": dial tcp 172.30.0.1:443: i/o timeout I0923 16:00:14.047264 1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-storage-operator/snapshot-controller-leader... I0923 16:00:14.326347 1 leaderelection.go:258] successfully acquired lease openshift-cluster-storage-operator/snapshot-controller-leader I0923 16:00:14.327196 1 leader_election.go:178] became leader, starting I0923 16:00:14.340326 1 snapshot_controller_base.go:152] Starting snapshot controller I0929 10:59:47.167089 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-w9mr7 was successfully created by the CSI driver. I0929 10:59:47.167167 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-w9mr7 is ready to use. I1006 16:05:07.772888 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI driver. I1006 16:05:07.772958 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use. I1008 10:01:56.228759 1 snapshot_controller_base.go:269] deletion of snapshot "ns-3abd639f-018b-489d-b3b4-ba9a02fa1782/name-3abd639f-018b-489d-b3b4-ba9a02fa1782" was already processed I1008 10:45:51.057537 1 snapshot_controller_base.go:269] deletion of snapshot "ns-b76a37bb-d8d5-4141-a494-2299231f5d9f/name-b76a37bb-d8d5-4141-a494-2299231f5d9f" was already processed E1010 09:27:06.437440 1 snapshot_controller_base.go:403] could not sync snapshot "ocp-mysql/velero-mysql-c7ckk": snapshot controller failed to update velero-mysql-c7ckk on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-mysql-c7ckk": StorageError: invalid object, Code: 4, Key: /kubernetes.io/snapshot.storage.k8s.io/volumesnapshots/ocp-mysql/velero-mysql-c7ckk, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 6f5ca640-0dfa-4a9c-8696-7a52fc93bae3, UID in object meta: I1010 09:27:07.440434 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed I1010 09:27:36.803123 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed I1010 09:27:47.630757 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-w9mr7" was already processed I1010 09:29:52.134217 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI driver. I1010 09:29:52.134265 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Install Volsync from latest stable channel from the operator hub on both the clusters
2. Create a secret
$ oc create secret generic <secret-name> -n openshift-adp --from-file cloud=<credentials file path>
3. Create a DPA instance with dataMover:{ enable: true} and csi plugin listed on the defaultPlugins list on both the clusters.
For example:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: example-velero namespace: openshift-adp spec: backupLocations: - velero: config: profile: default region: eu-central-1 credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: myoadptestbucket prefix: velero provider: aws configuration: velero: defaultPlugins: - openshift - aws - kubevirt - csi features: dataMover: enable: true kind: List metadata: resourceVersion: "" selfLink: "" EOF
4. Edit the volumesnapshotclass as follows:
oc get vsclass ocs-storagecluster-cephfsplugin-snapclass -oyaml apiVersion: snapshot.storage.k8s.io/v1 deletionPolicy: Retain driver: openshift-storage.cephfs.csi.ceph.com kind: VolumeSnapshotClass metadata: creationTimestamp: "2022-09-26T08:33:11Z" generation: 2 labels: velero.io/csi-volumesnapshot-class: "true" name: ocs-storagecluster-cephfsplugin-snapclass resourceVersion: "1710327" uid: 9185ea19-f13f-479f-8d7a-68da472d7d2a parameters: clusterID: openshift-storage csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage
5. Deploy a sample application
# oc get all,pvc -n ocp-mysql NAME READY STATUS RESTARTS AGE pod/mysql-6d64ccd47-b4vwr 1/1 Running 0 3d19h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mysql ClusterIP 172.30.175.178 <none> 3306/TCP 3d19h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql 1/1 1 1 3d19h NAME DESIRED CURRENT READY AGE replicaset.apps/mysql-6d64ccd47 1 1 1 3d19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mysql Bound pvc-fd0159b3-f7ba-4fb0-9a76-e7f8fe988457 2Gi RWO ocs-storagecluster-cephfs 3d19h
6. Create backup on cluster1
# cat backup.yaml apiVersion: velero.io/v1 kind: Backup metadata: name: velero-sample1-withcsi-ocpmysql labels: velero.io/storage-location: default namespace: openshift-adp spec: hooks: {} includedNamespaces: - ocp-mysql includeClusterResources: false storageLocation: velero-sample-1 ttl: 720h0m0s
# oc get backup -n openshift-adp NAME AGE velero-sample-withcsi-ocpmysql 3d19h velero-sample-withcsi-sep19 3d19h velero-sample-withcsi-sep20acmeair 3d19h velero-sample-withcsi-sep21fileuploader 3d19h velero-sample-withcsisep26-sepfileuploader 3d19h velero-sample-witherestic-sep22fileuploader 3d19h velero-sample1-withcsi-ocpmysql 3d19h
# oc get backup -n openshift-adp velero-sample1-withcsi-ocpmysql -o jsonpath={.status.phase} Completed
7. Check for the volumesnapshotcontent on the cluster
# oc get volumesnapshotcontents | grep ocp-mysql snapcontent-a7737939-fe72-45f7-b41f-83575698681e true 2147483648 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass velero-mysql-c7ckk ocp-mysql 3d19h
8. Restore the backup on the new cluster
# cat restore.yaml apiVersion: velero.io/v1 kind: Restore metadata: name: velero-restore1-withcsi-ocpmysql namespace: openshift-adp spec: backupName: velero-sample1-withcsi-ocpmysql excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io restorePVs: true
9. Check that the volumesnapshotcontent is copied to the newcluster
[root@m4202001 ~]# oc get volumesnapshotcontents | grep ocp-mysql velero-velero-mysql-c7ckk-nd98j true 0 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass velero-mysql-c7ckk ocp-mysql 93m
10. Verify that the application is up and running
Actual results:
pvc and the pod des not come up on the new cluster
# oc get all,pvc -n ocp-mysql NAME READY STATUS RESTARTS AGE pod/mysql-6d64ccd47-hlv4f 0/1 Pending 0 96m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mysql ClusterIP 172.30.170.174 <none> 3306/TCP 96m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql 0/1 1 0 96m NAME DESIRED CURRENT READY AGE replicaset.apps/mysql-6d64ccd47 1 1 0 96m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mysql Pending ocs-storagecluster-cephfs 96m
# oc describe persistentvolumeclaim/mysql -nocp-mysql Name: mysql Namespace: ocp-mysql StorageClass: ocs-storagecluster-cephfs Status: Pending Volume: Labels: app=mysql testlabel=selectors testlabel2=foo velero.io/backup-name=velero-sample1-withcsi-ocpmysql velero.io/restore-name=velero-restore1-withcsi-ocpmysql velero.io/volume-snapshot-name=velero-mysql-c7ckk Annotations: velero.io/backup-name: velero-sample1-withcsi-ocpmysql velero.io/volume-snapshot-name: velero-mysql-c7ckk volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: velero-mysql-c7ckk Used By: mysql-6d64ccd47-hlv4f Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 75m (x14 over 94m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.82f2f23f-d274-4a09-bc68-f89a66e8c1c0" Normal ExternalProvisioning 4m31s (x371 over 94m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator Normal Provisioning 35s (x34 over 94m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5 External provisioner is provisioning volume for claim "ocp-mysql/mysql"
Expected results:
The application should be restored successfully on the new cluster
Additional info:
Slack threads: https://coreos.slack.com/archives/C0144ECKUJ0/p1666104590173779
- is related to
-
OADP-943 Data mover restore successful on separate cluster with VolumeSnapshotContent deletionPolicy not set to Retain.
- Closed
- is triggered by
-
OADP-989 Data Mover could restore PVCs with mover pod in a different node to the workload pod causing issues for PVCs with ReadWriteOnce accessMode
- Closed
- relates to
-
OADP-1181 Update Release Notes Page & Known Issues
- Testing
- links to