Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-478

volumesnapshotcontent cannot be deleted; SnapshotDeleteError Failed to delete snapshot

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0
    • 0

    Description

      Previously reported on https://bugzilla.redhat.com/show_bug.cgi?id=1951399
      This issue is in the scope of OADP.

      ~~~
      Description of problem (please be detailed as possible and provide log
      snippets):

      After restoring from a OADP backup with a cephfs csi volume and then deleting the backup, a volumesnapshotcontent still exists. When trying to manually delete it, it just hangs.

      oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
      (hangs)

      oc describe volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj

      Spec:
      Deletion Policy: Delete
      Driver: openshift-storage.cephfs.csi.ceph.com
      Source:
      Snapshot Handle: 0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
      Volume Snapshot Class Name: ocs-storagecluster-cephfsplugin-snapclass-velero
      Volume Snapshot Ref:
      Kind: VolumeSnapshot
      Name: velero-demo-cephfs-pvc-vpl4t
      Namespace: testns
      UID: ce14ec3c-d8d6-4c83-a41a-f919a7d3966e
      Status:
      Creation Time: 1618880071837960692
      Ready To Use: true
      Restore Size: 0
      Snapshot Handle: 0001-0011-openshift-storage-0000000000000001-7594b7ad-a172-11eb-ba3e-0a580afe17a8
      Events:
      Type Reason Age From Message
      ---- ------ ---- ---- -------
      Warning SnapshotDeleteError 79m (x143 over 3h20m) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot
      Warning SnapshotDeleteError 3m23s (x90 over 74m) csi-snapshotter openshift-storage.cephfs.csi.ceph.com Failed to delete snapshot

      oc logs csi-cephfsplugin-provisioner-66c59d467f-ggwpd -c csi-snapshotter

      I0420 01:08:31.456278 1 reflector.go:369] github.com/kubernetes-csi/external-snapshotter/client/v3/informers/externalversions/factory.go:117: forcing resync
      I0420 01:08:31.456388 1 snapshot_controller_base.go:140] enqueued "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" for sync
      I0420 01:08:31.456421 1 snapshot_controller_base.go:174] syncContentByKey[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
      I0420 01:08:31.456443 1 util.go:258] storeObjectUpdate updating content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj" with version 82402937
      I0420 01:08:31.456456 1 snapshot_controller.go:57] synchronizing VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
      I0420 01:08:31.456497 1 snapshot_controller.go:531] Check if VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] should be deleted.
      I0420 01:08:31.456524 1 snapshot_controller.go:60] VolumeSnapshotContent[velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]: the policy is Delete
      I0420 01:08:31.456532 1 snapshot_controller.go:92] Deleting snapshot for content: velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
      I0420 01:08:31.456537 1 snapshot_controller.go:329] deleteCSISnapshotOperation [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj] started
      I0420 01:08:31.456542 1 snapshot_controller.go:181] getCSISnapshotInput for content [velero-velero-demo-cephfs-pvc-vpl4t-rdnbj]
      I0420 01:08:31.456546 1 snapshot_controller.go:439] getSnapshotClass: VolumeSnapshotClassName [ocs-storagecluster-cephfsplugin-snapclass-velero]
      E0420 01:08:31.457834 1 snapshot_controller_base.go:261] could not sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj": failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
      I0420 01:08:31.457873 1 snapshot_controller_base.go:163] Failed to sync content "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", will retry again: failed to delete snapshot "velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", err: failed to delete snapshot content velero-velero-demo-cephfs-pvc-vpl4t-rdnbj: "rpc error: code = InvalidArgument desc = provided secret is empty"
      I0420 01:08:31.458124 1 event.go:282] Event(v1.ObjectReference

      {Kind:"VolumeSnapshotContent", Namespace:"", Name:"velero-velero-demo-cephfs-pvc-vpl4t-rdnbj", UID:"8ae5a30f-f90d-4cf9-b98f-58ba895622ae", APIVersion:"snapshot.storage.k8s.io/v1beta1", ResourceVersion:"82402937", FieldPath:""}

      ): type: 'Warning' reason: 'SnapshotDeleteError' Failed to delete snapshot

      Version of all relevant components (if applicable):
      OADP 0.2.0 with CSI plugin
      OCP 4.6.9
      OCS 4.6.4

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      If a volumesnapshotcontent cannot be deleted, it's possible that storage usage keeps increasing even though a backup is deleted.

      Is there any workaround available to the best of your knowledge?

      No

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      3

      Can this issue reproducible?

      Yes

      Can this issue reproduce from the UI?

      No

      If this is a regression, please provide more details to justify this:

      n/a

      Steps to Reproduce:

      1. Create a sample application that uses ocs-storagecluster-cephfs sc
      oc new-project testns
      oc apply -f demo.cephfs.yaml
      oc apply -f testpod.yaml

      cat demo.cephfs.yaml
      kind: PersistentVolumeClaim
      apiVersion: v1
      metadata:
      name: demo-cephfs-pvc
      spec:
      storageClassName: ocs-storagecluster-cephfs
      accessModes:

      • ReadWriteMany
        resources:
        requests:
        storage: 40Gi

      cat testpod.yaml
      apiVersion: v1
      kind: Pod
      metadata:
      name: testpod
      spec:
      containers:

      • command:
      • sleep
      • infinity
        image: registry.redhat.io/ubi8/ubi:latest
        imagePullPolicy: Always
        name: main
        resources: {}
        volumeMounts:
      • mountPath: /mnt
        name: cpd-data-vol
        restartPolicy: Never
        volumes:
      • name: cpd-data-vol
        persistentVolumeClaim:
        claimName: demo-cephfs-pvc

      2. Using OADP, create a backup
      ./velero backup create mybackup --include-namespaces testns --exclude-resources='Event,Event.events.k8s.io'

      3. Delete namespace
      oc delete ns testns

      4. Using OADP, restore
      ./velero restore create --from-backup mybackup myrestore --exclude-resources='ImageTag'

      After restore, there are 2 volumesnapshotcontents, and 1 volumesnapshot

      oc get volumesnapshotcontents
      NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE
      snapcontent-fea465c8-5485-48ba-b3de-897bd0f1bc4c true 42949672960 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 4m12s

      velero-velero-demo-cephfs-pvc-vpl4t-rdnbj true 0 Retain openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 32s

      oc get volumesnapshot
      NAME READYTOUSE SOURCEPVC SOURCESNAPSHOTCONTENT RESTORESIZE SNAPSHOTCLASS SNAPSHOTCONTENT CREATIONTIME AGE
      velero-demo-cephfs-pvc-vpl4t true velero-velero-demo-cephfs-pvc-vpl4t-rdnbj 0 ocs-storagecluster-cephfsplugin-snapclass-velero velero-velero-demo-cephfs-pvc-vpl4t-rdnbj 36s 36s

      5. Delete the backup
      ./velero backup delete mybackup

      Actual results:

      After deleting the backup, one of the volumesnapshotcontent still exists. Trying to manually delete it, it hangs.

      oc get volumesnapshotcontents
      NAME READYTOUSE RESTORESIZE DELETIONPOLICY DRIVER VOLUMESNAPSHOTCLASS VOLUMESNAPSHOT AGE
      velero-velero-demo-cephfs-pvc-vpl4t-rdnbj true 0 Delete openshift-storage.cephfs.csi.ceph.com ocs-storagecluster-cephfsplugin-snapclass-velero velero-demo-cephfs-pvc-vpl4t 77s

      oc delete volumesnapshotcontents velero-velero-demo-cephfs-pvc-vpl4t-rdnbj
      (hangs)

      Expected results:

      volumesnapshotcontents associated with the backup or restore should be deleted.
      At the very least, it should be possible to manually delete it.
      ~~~

      Attachments

        Issue Links

          Activity

            People

              sseago Scott Seago
              tkaovila@redhat.com Tiger Kaovilai
              Maya Peretz Maya Peretz
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: