-
Sub-task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
4
-
False
-
-
False
-
ToDo
-
-
-
0
-
0.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
Description of problem:
CSI restore fails on a new cluster with Datamover enabled on both the clusters.
# oc logs csi-snapshot-controller-7b4dbd9b55-9xx4v -n openshift-cluster-storage-operator I0923 15:59:07.974528 1 main.go:125] Version: v4.12.0-202208231618.p0.g5a93140.assembly.stream-0-gb48c7b0-dirty I0923 15:59:07.984341 1 main.go:174] Start NewCSISnapshotController with kubeconfig [] resyncPeriod [15m0s] E0923 16:00:08.139182 1 main.go:86] Failed to list v1 volumesnapshots with error=Get "https://172.30.0.1:443/apis/snapshot.storage.k8s.io/v1/volumesnapshots": dial tcp 172.30.0.1:443: i/o timeout I0923 16:00:14.047264 1 leaderelection.go:248] attempting to acquire leader lease openshift-cluster-storage-operator/snapshot-controller-leader... I0923 16:00:14.326347 1 leaderelection.go:258] successfully acquired lease openshift-cluster-storage-operator/snapshot-controller-leader I0923 16:00:14.327196 1 leader_election.go:178] became leader, starting I0923 16:00:14.340326 1 snapshot_controller_base.go:152] Starting snapshot controller I0929 10:59:47.167089 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-w9mr7 was successfully created by the CSI driver. I0929 10:59:47.167167 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-w9mr7", UID:"53c47235-2a11-4394-888f-8167e4dee785", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"7815207", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-w9mr7 is ready to use. I1006 16:05:07.772888 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI driver. I1006 16:05:07.772958 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"6f5ca640-0dfa-4a9c-8696-7a52fc93bae3", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"20359659", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use. I1008 10:01:56.228759 1 snapshot_controller_base.go:269] deletion of snapshot "ns-3abd639f-018b-489d-b3b4-ba9a02fa1782/name-3abd639f-018b-489d-b3b4-ba9a02fa1782" was already processed I1008 10:45:51.057537 1 snapshot_controller_base.go:269] deletion of snapshot "ns-b76a37bb-d8d5-4141-a494-2299231f5d9f/name-b76a37bb-d8d5-4141-a494-2299231f5d9f" was already processed E1010 09:27:06.437440 1 snapshot_controller_base.go:403] could not sync snapshot "ocp-mysql/velero-mysql-c7ckk": snapshot controller failed to update velero-mysql-c7ckk on API server: Operation cannot be fulfilled on volumesnapshots.snapshot.storage.k8s.io "velero-mysql-c7ckk": StorageError: invalid object, Code: 4, Key: /kubernetes.io/snapshot.storage.k8s.io/volumesnapshots/ocp-mysql/velero-mysql-c7ckk, ResourceVersion: 0, AdditionalErrorMsg: Precondition failed: UID in precondition: 6f5ca640-0dfa-4a9c-8696-7a52fc93bae3, UID in object meta: I1010 09:27:07.440434 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed I1010 09:27:36.803123 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-c7ckk" was already processed I1010 09:27:47.630757 1 snapshot_controller_base.go:269] deletion of snapshot "ocp-mysql/velero-mysql-w9mr7" was already processed I1010 09:29:52.134217 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotCreated' Snapshot ocp-mysql/velero-mysql-c7ckk was successfully created by the CSI driver. I1010 09:29:52.134265 1 event.go:285] Event(v1.ObjectReference{Kind:"VolumeSnapshot", Namespace:"ocp-mysql", Name:"velero-mysql-c7ckk", UID:"7d3aae75-3012-4486-bfe4-2c3cda70ffc7", APIVersion:"snapshot.storage.k8s.io/v1", ResourceVersion:"27129164", FieldPath:""}): type: 'Normal' reason: 'SnapshotReady' Snapshot ocp-mysql/velero-mysql-c7ckk is ready to use.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Install Volsync from latest stable channel from the operator hub on both the clusters
2. Create a secret
$ oc create secret generic <secret-name> -n openshift-adp --from-file cloud=<credentials file path>
3. Create a DPA instance with dataMover:{ enable: true} and csi plugin listed on the defaultPlugins list on both the clusters.
For example:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: name: example-velero namespace: openshift-adp spec: backupLocations: - velero: config: profile: default region: eu-central-1 credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: myoadptestbucket prefix: velero provider: aws configuration: velero: defaultPlugins: - openshift - aws - kubevirt - csi features: dataMover: enable: true kind: List metadata: resourceVersion: "" selfLink: "" EOF
4. Edit the volumesnapshotclass as follows:
oc get vsclass ocs-storagecluster-cephfsplugin-snapclass -oyaml apiVersion: snapshot.storage.k8s.io/v1 deletionPolicy: Retain driver: openshift-storage.cephfs.csi.ceph.com kind: VolumeSnapshotClass metadata: creationTimestamp: "2022-09-26T08:33:11Z" generation: 2 labels: velero.io/csi-volumesnapshot-class: "true" name: ocs-storagecluster-cephfsplugin-snapclass resourceVersion: "1710327" uid: 9185ea19-f13f-479f-8d7a-68da472d7d2a parameters: clusterID: openshift-storage csi.storage.k8s.io/snapshotter-secret-name: rook-csi-cephfs-provisioner csi.storage.k8s.io/snapshotter-secret-namespace: openshift-storage
5. Deploy a sample application
# oc get all,pvc -n ocp-mysql NAME READY STATUS RESTARTS AGE pod/mysql-6d64ccd47-b4vwr 1/1 Running 0 3d19h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mysql ClusterIP 172.30.175.178 <none> 3306/TCP 3d19h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql 1/1 1 1 3d19h NAME DESIRED CURRENT READY AGE replicaset.apps/mysql-6d64ccd47 1 1 1 3d19h NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mysql Bound pvc-fd0159b3-f7ba-4fb0-9a76-e7f8fe988457 2Gi RWO ocs-storagecluster-cephfs 3d19h
6. Create backup on cluster1
# cat backup.yaml apiVersion: velero.io/v1 kind: Backup metadata: name: velero-sample1-withcsi-ocpmysql labels: velero.io/storage-location: default namespace: openshift-adp spec: hooks: {} includedNamespaces: - ocp-mysql includeClusterResources: false storageLocation: velero-sample-1 ttl: 720h0m0s
# oc get backup -n openshift-adp NAME AGE velero-sample-withcsi-ocpmysql 3d19h velero-sample-withcsi-sep19 3d19h velero-sample-withcsi-sep20acmeair 3d19h velero-sample-withcsi-sep21fileuploader 3d19h velero-sample-withcsisep26-sepfileuploader 3d19h velero-sample-witherestic-sep22fileuploader 3d19h velero-sample1-withcsi-ocpmysql 3d19h
# oc get backup -n openshift-adp velero-sample1-withcsi-ocpmysql -o jsonpath={.status.phase} Completed
7. Check for the volumesnapshotcontent on the cluster
# oc get volumesnapshotcontents | grep ocp-mysql snapcontent-a7737939-fe72-45f7-b41f-83575698681e true 2147483648 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass velero-mysql-c7ckk ocp-mysql 3d19h
8. Restore the backup on the new cluster
# cat restore.yaml apiVersion: velero.io/v1 kind: Restore metadata: name: velero-restore1-withcsi-ocpmysql namespace: openshift-adp spec: backupName: velero-sample1-withcsi-ocpmysql excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io restorePVs: true
9. Check that the volumesnapshotcontent is copied to the newcluster
[root@m4202001 ~]# oc get volumesnapshotcontents | grep ocp-mysql velero-velero-mysql-c7ckk-nd98j true 0 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass velero-mysql-c7ckk ocp-mysql 93m
10. Verify that the application is up and running
Actual results:
pvc and the pod des not come up on the new cluster
# oc get all,pvc -n ocp-mysql NAME READY STATUS RESTARTS AGE pod/mysql-6d64ccd47-hlv4f 0/1 Pending 0 96m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/mysql ClusterIP 172.30.170.174 <none> 3306/TCP 96m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/mysql 0/1 1 0 96m NAME DESIRED CURRENT READY AGE replicaset.apps/mysql-6d64ccd47 1 1 0 96m NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE persistentvolumeclaim/mysql Pending ocs-storagecluster-cephfs 96m
# oc describe persistentvolumeclaim/mysql -nocp-mysql Name: mysql Namespace: ocp-mysql StorageClass: ocs-storagecluster-cephfs Status: Pending Volume: Labels: app=mysql testlabel=selectors testlabel2=foo velero.io/backup-name=velero-sample1-withcsi-ocpmysql velero.io/restore-name=velero-restore1-withcsi-ocpmysql velero.io/volume-snapshot-name=velero-mysql-c7ckk Annotations: velero.io/backup-name: velero-sample1-withcsi-ocpmysql velero.io/volume-snapshot-name: velero-mysql-c7ckk volume.kubernetes.io/storage-provisioner: openshift-storage.cephfs.csi.ceph.com Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem DataSource: APIGroup: snapshot.storage.k8s.io Kind: VolumeSnapshot Name: velero-mysql-c7ckk Used By: mysql-6d64ccd47-hlv4f Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ProvisioningFailed 75m (x14 over 94m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5 failed to provision volume with StorageClass "ocs-storagecluster-cephfs": rpc error: code = Internal desc = key not found: no snap source in omap for "csi.snap.82f2f23f-d274-4a09-bc68-f89a66e8c1c0" Normal ExternalProvisioning 4m31s (x371 over 94m) persistentvolume-controller waiting for a volume to be created, either by external provisioner "openshift-storage.cephfs.csi.ceph.com" or manually created by system administrator Normal Provisioning 35s (x34 over 94m) openshift-storage.cephfs.csi.ceph.com_csi-cephfsplugin-provisioner-7bdf9d685c-msbb7_c3e0ea1e-588f-4b96-8c1e-910f3f2534e5 External provisioner is provisioning volume for claim "ocp-mysql/mysql"
Expected results:
The application should be restored successfully on the new cluster
Additional info:
Slack threads: https://coreos.slack.com/archives/C0144ECKUJ0/p1666104590173779