-
Bug
-
Resolution: Done
-
Normal
-
OADP 1.0.3
-
False
-
-
False
-
ToDo
-
0
-
0
-
0
-
None
Previously reported on https://bugzilla.redhat.com/show_bug.cgi?id=1951399
This issue is in the scope of OADP.
~~~
Description of problem (please be detailed as possible and provide log
snippets):
After restoring from a OADP backup with a cephfs csi volume and then deleting the backup, a volumesnapshotcontent still exists. When trying to manually delete it, it just hangs.
oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
(hangs)
oc describe volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
Spec:
Deletion Policy: Delete
Driver: openshift-storage.cephfs.csi.ceph.com
Source:
Snapshot Handle: 0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
Volume Snapshot Class Name: ocs-storagecluster-cephfsplugin-snapclass-velero
Volume Snapshot Ref:
Kind: VolumeSnapshot
Name: velero-demo-cephfs-pvc-vpl4t
Namespace: testns
UID: ce14ec3c-d8d6-4c83-a41a-f919a7d3966e
Status:
Creation Time: 1618880071837960692
Ready To Use: true
Restore Size: 0
Snapshot Handle: 0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SnapshotDeleteError 79m (x143 over 3h20m) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
Warning SnapshotDeleteError 3m23s (x90 over 74m) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
oc logs csi-cephfsplugin-provisioner-66c59d467f-ggwpd -c csi-snapshotter
I0420 01:08:31.456278 1 reflector.go:369] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: forcing resync
I0420 01:08:31.456388 1 snapshot_controller_base.go:140] enqueued "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" for sync
I0420 01:08:31.456421 1 snapshot_controller_base.go:174] syncContentByKey[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456443 1 util.go:258] storeObjectUpdate updating content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" with version 82402937
I0420 01:08:31.456456 1 snapshot_controller.go:57] synchronizing VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456497 1 snapshot_controller.go:531] Check if VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] should be deleted.
I0420 01:08:31.456524 1 snapshot_controller.go:60] VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]: the policy is Delete
I0420 01:08:31.456532 1 snapshot_controller.go:92] Deleting snapshot for content: velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
I0420 01:08:31.456537 1 snapshot_controller.go:329] deleteCSISnapshotOperation [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] started
I0420 01:08:31.456542 1 snapshot_controller.go:181] getCSISnapshotInput for content [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
I0420 01:08:31.456546 1 snapshot_controller.go:439] getSnapshotClass: VolumeSnapshotClassName [ocs-storagecluster-cephfsplugin-snapclass-velero]
E0420 01:08:31.457834 1 snapshot_controller_base.go:261] could not sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj": failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
I0420 01:08:31.457873 1 snapshot_controller_base.go:163] Failed to sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", will retry again: failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
I0420 01:08:31.458124 1 event.go:282] Event(v1.ObjectReference
): type: 'Warning' reason: 'SnapshotDeleteError' Failed to delete snapshot
Version of all relevant components (if applicable):
OADP 0.2.0 with CSI plugin
OCP 4.6.9
OCS 4.6.4
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
If a volumesnapshotcontent cannot be deleted, it's possible that storage usage keeps increasing even though a backup is deleted.
Is there any workaround available to the best of your knowledge?
No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
3
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
No
If this is a regression, please provide more details to justify this:
n/a
Steps to Reproduce:
1. Create a sample application that uses ocs-storagecluster-cephfs sc
oc new-project testns
oc apply -f demo.cephfs.yaml
oc apply -f testpod.yaml
cat demo.cephfs.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: demo-cephfs-pvc
spec:
storageClassName: ocs-storagecluster-cephfs
accessModes:
- ReadWriteMany
resources:
requests:
storage: 40Gi
cat testpod.yaml
apiVersion: v1
kind: Pod
metadata:
name: testpod
spec:
containers:
- command:
- sleep
- infinity
image: registry.redhat.io/ubi8/ubi:latest
imagePullPolicy: Always
name: main
resources: {}
volumeMounts: - mountPath: /mnt
name: cpd-data-vol
restartPolicy: Never
volumes: - name: cpd-data-vol
persistentVolumeClaim:
claimName: demo-cephfs-pvc
2. Using OADP, create a backup
./velero backup create mybackup --include-namespaces testns --exclude-resources='Event,Event.events.k8s.io'
3. Delete namespace
oc delete ns testns
4. Using OADP, restore
./velero restore create --from-backup mybackup myrestore --exclude-resources='ImageTag'
After restore, there are 2 volumesnapshotcontents, and 1 volumesnapshot
oc get volumesnapshotcontents
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE
snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c true 42949672960 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 4m12s
velero-velero-demo-cephfs-pvc-vpl4t-rdnbj true 0 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 32s
oc get volumesnapshot
NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
velero-demo-cephfs-pvc-vpl4t true velero-velero-demo-cephfs-pvc-vpl4t-rdnbj 0 ocs-storagecluster-cephfsplugin-snapclass-velero velero-velero-demo-cephfs-pvc-vpl4t-rdnbj 36s 36s
5. Delete the backup
./velero backup delete mybackup
Actual results:
After deleting the backup, one of the volumesnapshotcontent still exists. Trying to manually delete it, it hangs.
oc get volumesnapshotcontents
NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE
velero-velero-demo-cephfs-pvc-vpl4t-rdnbj true 0 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 77s
oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
(hangs)
Expected results:
volumesnapshotcontents associated with the backup or restore should be deleted.
At the very least, it should be possible to manually delete it.
~~~
- duplicates
-
OADP-528 The volumesnapshotcontent is not removed for the synced backup
- Closed