Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-849

DataMover: restore PartiallyFails randomly with "ReplicationDestination.volsync.backube xxxx not found" error

    XMLWordPrintable

Details

    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Proposed

    Description

      Description of problem: effectively from 1.1.1 (not sure exactly in which build it was first introduced), restore started to fail randomly with "ReplicationDestination.volsync.backube xxxx not found" error (where xxx is the name of the replicationdestination CR), although it looks like the ReplicationDestination was created eventually.

      Also looks like the CSI volumesnapshot fails with the following error:

      'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-tz5f4-4hq58: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: 655bbfa8-c7e0-4311-900b-092f2e617f8d"'

       

      Please note that the restore can also pass for the same application.

      Version-Release number of selected component (if applicable):

      1.1.1, build oadp-operator-bundle-container-1.1.1-21

       

      How reproducible: happens a lot, not sure exactly how much.

      Steps to Reproduce:
      1. Create a backup of a stateful application with datamover for PV backup
      2. Make sure backup completes sucessfully - no errors on volmesnapshots nor on VSB (VSB is in "Completed" phase)
      3. Delete the application namespace
      4. Once the namespace is removed, create a restore of the backup

      Actual results:

      Restore may fail with the following errors:

      VSR:

      [mperetz@fedora oadp-e2e-qe]$   oc get vsr -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: datamover.oadp.openshift.io/v1alpha1
        kind: VolumeSnapshotRestore
        metadata:
          creationTimestamp: "2022-10-11T10:29:27Z"
          generateName: vsr-
          generation: 1
          labels:
            velero.io/persistent-volume-claim-name: mysql
            velero.io/restore-name: mysql-ad93ad8a-494e-11ed-b0c4-902e163f806c
          name: vsr-mjp78
          namespace: mysql-persistent
          resourceVersion: "65364"
          uid: 47455522-fc52-44d1-a14c-f35205a7389e
        spec:
          protectedNamespace: openshift-adp
          resticSecretRef:
            name: ts-dpa-1-volsync-restic
          volumeSnapshotMoverBackupRef:
            resticrepository: s3:s3.amazonaws.com/oadpbucket145568/openshift-adp/snapcontent-4131629e-a252-44ba-8ccf-99553ea06a7d-pvc
            sourcePVCData:
              name: mysql
              size: 2Gi
              storageClassName: gp2-csi
            volumeSnapshotClassName: example-snapclass
        status:
          conditions:
          - lastTransitionTime: "2022-10-11T10:29:27Z"
            message: ReplicationDestination.volsync.backube "vsr-mjp78-rep-dest" not found
            reason: Error
            status: "False"
            type: Reconciled
          phase: Failed
      kind: List
      metadata:
        resourceVersion: ""

      VolumeSnapshot:

      [mperetz@fedora oadp-e2e-qe]$ oc get volumesnapshot -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshot
        metadata:
          annotations:
            velero.io/csi-driver-name: ebs.csi.aws.com
            velero.io/csi-volumesnapshot-handle: snap-0e464beebb6c4c180
            velero.io/csi-vsc-deletion-policy: Retain
            velero.io/vsi-volumesnapshot-restore-size: 2Gi
          creationTimestamp: "2022-10-11T10:29:35Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
          generation: 1
          labels:
            velero.io/backup-name: mysql-ad93ad8a-494e-11ed-b0c4-902e163f806c
            velero.io/restore-name: mysql-ad93ad8a-494e-11ed-b0c4-902e163f806c
          name: velero-mysql-cwwp5
          namespace: mysql-persistent
          resourceVersion: "67546"
          uid: 0276f2a6-4d17-4dee-a0c6-1fa8f9b33d76
        spec:
          source:
            volumeSnapshotContentName: velero-velero-mysql-cwwp5-j4qdt
          volumeSnapshotClassName: example-snapclass
        status:
          boundVolumeSnapshotContentName: velero-velero-mysql-cwwp5-j4qdt
          error:
            message: 'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-cwwp5-j4qdt: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: f18b6673-526e-49db-bff3-5635de93d5c7"'
            time: "2022-10-11T10:31:47Z"
          readyToUse: false
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshot
        metadata:
          creationTimestamp: "2022-10-11T10:29:43Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
          generation: 1
          labels:
            app.kubernetes.io/created-by: volsync
          name: volsync-vsr-mjp78-rep-dest-dest-20221011102943
          namespace: openshift-adp
          ownerReferences:
          - apiVersion: volsync.backube/v1alpha1
            blockOwnerDeletion: true
            controller: true
            kind: ReplicationDestination
            name: vsr-mjp78-rep-dest
            uid: 466d66c3-b544-4460-bca8-9f383f80352d
          resourceVersion: "66319"
          uid: 236a5c9a-f9a3-4a9e-a83b-a2e7ed279cb3
        spec:
          source:
            persistentVolumeClaimName: volsync-vsr-mjp78-rep-dest-dest
          volumeSnapshotClassName: example-snapclass
        status:
          boundVolumeSnapshotContentName: snapcontent-236a5c9a-f9a3-4a9e-a83b-a2e7ed279cb3
          creationTime: "2022-10-11T10:29:45Z"
          readyToUse: true
          restoreSize: 2Gi
      kind: List
      metadata:
        resourceVersion: ""

      Expected results: Restore should pass

       

      Additional info:

      Attachments

        Issue Links

          Activity

            People

              spampatt@redhat.com Shubham Pampattiwar
              mperetz@redhat.com Maya Peretz
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: