Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-655

DataMover: volsync-dst-vsr pod completes although not all items where restored in the namespace

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.1.0-49
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Proposed
    • Yes

      Description of problem: volsync-dst-vsr pod completes although not all items where restored in the namespace .

      [mperetz@mperetz oadp-e2e-qe]$ oc get pods -n openshift-adp
      NAME                                               READY   STATUS      RESTARTS   AGE
      openshift-adp-controller-manager-c554d65f9-lhgj8   1/1     Running     0          87m
      restic-248ss                                       1/1     Running     0          41m
      restic-q9v5v                                       1/1     Running     0          41m
      restic-vkvbt                                       1/1     Running     0          41m
      velero-5c6fcff564-49fph                            1/1     Running     0          26m
      volsync-dst-vsr-mysql-rep-dest-c68l7               0/1     Completed   0          11m
      volume-snapshot-mover-64cdcf4b97-djl25             1/1     Running     0          41m
       

      logs:

      [mperetz@mperetz oadp-e2e-qe]$ oc logs volsync-dst-vsr-mysql-rep-dest-c68l7 -n openshift-adp
      Starting container
      VolSync restic container version: ACM-0.4.1-e6dde1b
      restore
      Testing mandatory env variables
      === Starting restore ===
      /data /
      Selected restic snapshot with id: a150ca03
      restoring <Snapshot a150ca03 of [/data] at 2022-07-29 15:05:20.610304966 +0000 UTC by root@volsync> to .
      /
      === Done ===
       

      Eventually restore times out with the following error:

      [mperetz@mperetz oadp-e2e-qe]$ velero restore logs mysql-87c9b19f-0f48-11ed-946b-902e163f806c -n openshift-adp | grep error
      time="2022-07-29T14:32:16Z" level=error msg="Timed out awaiting reconciliation of volumesnapshotrestore vsr-mysql" cmd=/plugins/velero-plugin-for-csi logSource="/remote-source/app/internal/util/util.go:392" pluginName=velero-plugin-for-csi restore=openshift-adp/mysql-87c9b19f-0f48-11ed-946b-902e163f806c
      time="2022-07-29T14:32:19Z" level=error msg="Namespace mysql-persistent, resource restore error: error preparing volumesnapshotbackups.datamover.oadp.openshift.io/mysql-persistent/vsb-velero-mysql-l6qtf: rpc error: code = Unknown desc = timed out waiting for the condition" logSource="pkg/controller/restore_controller.go:504" restore=openshift-adp/mysql-87c9b19f-0f48-11ed-946b-902e163f806c
       

      Version-Release number of selected component (if applicable):

      downstream build 1.1.0-45

      currently checked only on OCP 4.11

      How reproducible: 

       

      Steps to Reproduce:
      1. Install Volsync from latest stable channel

      cat <<EOF | oc create -f -
      apiVersion: operators.coreos.com/v1alpha1
      kind: Subscription
      metadata:
        generateName: oadp-
        name: oadp-operator
        namespace: openshift-operators
      spec:
        channel: stable
        installPlanApproval: Automatic
        name: volsync-product
        source: prestage-operators
        sourceNamespace: openshift-marketplace
      EOF

      2. Create VSC:

      apiVersion: snapshot.storage.k8s.io/v1
      deletionPolicy: Retain
      driver: ebs.csi.aws.com
      kind: VolumeSnapshotClass
      metadata:
        annotations:
          snapshot.storage.kubernetes.io/is-default-class: "true"
        labels:
          velero.io/csi-volumesnapshot-class: "true"
        name: example-snapclass

      2. create restic-secret with the default name dm-credential:

      apiVersion: v1
      data:
        AWS_ACCESS_KEY_ID: XXXXXXX
        AWS_SECRET_ACCESS_KEY: XXXXXXXXXXXXXXX
        RESTIC_PASSWORD: my-secure-restic-password
        RESTIC_REPOSITORY: s3:s3.amazonaws.com/oadpbucket119606
      kind: Secret
      metadata:
        name: dm-credential
        namespace: openshift-adp
      type: Opaque

      3. Set DPA CR:

      apiVersion: v1
      items:

      • apiVersion: oadp.openshift.io/v1alpha1
          kind: DataProtectionApplication
          metadata:
            creationTimestamp: "2022-07-29T14:42:53Z"
            generation: 1
            name: ts-dpa
            namespace: openshift-adp
            resourceVersion: "110749"
            uid: deadc7d8-261e-4ea3-b8e5-86330e005ee3
          spec:
            backupLocations:
            - velero:
                config:
                  region: us-east-2
                credential:
                  key: cloud
                  name: cloud-credentials
                default: true
                objectStorage:
                  bucket: oadpbucket125675
                  prefix: velero-e2e-b4587c4f-0f4c-11ed-970c-902e163f806c
                provider: aws
            configuration:
              restic:
                enable: true
                podConfig:
                  resourceAllocations: {}
              velero:
                defaultPlugins:
                - openshift
                - aws
                - kubevirt
                - csi
            features:
              dataMover:
                enable: true
            podDnsConfig: {}
            snapshotLocations: []
          status:
            conditions:
            - lastTransitionTime: "2022-07-29T14:42:53Z"
              message: Reconcile complete
              reason: Complete
              status: "True"
              type: Reconciled
        kind: List
        metadata:
          resourceVersion: ""
          selfLink: ""

      4. Create backup of an application (mysql in my case). Make sure the status of the VSB and backup is Completed.

      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/source-cluster-k8s-gitversion: v1.24.0+9546431
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "24"
        creationTimestamp: "2022-07-29T14:56:53Z"
        generation: 5
        labels:
          velero.io/storage-location: ts-dpa-1
        name: backup11
        namespace: openshift-adp
        resourceVersion: "168532"
        uid: b1a7923c-3806-4c11-a765-6e588bc5042c
      spec:
        defaultVolumesToRestic: false
        hooks: {}
        includedNamespaces:
        - mysql-persistent
        metadata: {}
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
      status:
        completionTimestamp: "2022-07-29T15:06:10Z"
        csiVolumeSnapshotsAttempted: 1
        csiVolumeSnapshotsCompleted: 1
        expiration: "2022-08-28T14:58:49Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 54
          totalItems: 54
        startTimestamp: "2022-07-29T14:58:53Z"
        version: 1

      5. Delete the app namespace

      6. Create restore:
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        name: backup11-20220729151343
        namespace: openshift-adp
      spec:
        backupName: backup11
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        hooks: {}
        includedNamespaces:
        - '*'
        restorePVs: true

       

      Actual results:

      volumesnapshotcontent fails with this error:

      [mperetz@mperetz oadp-e2e-qe]$ oc get volumesnapshotcontent velero-velero-mysql-5q4v6-rkp9h -o yaml
      apiVersion: snapshot.storage.k8s.io/v1
      kind: VolumeSnapshotContent
      metadata:
        creationTimestamp: "2022-07-29T15:24:36Z"
        finalizers:
        - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
        generateName: velero-velero-mysql-5q4v6-
        generation: 2
        labels:
          velero.io/restore-name: backup11-20220729151343
        name: velero-velero-mysql-5q4v6-rkp9h
        resourceVersion: "240517"
        uid: 1edb74f3-c3ce-4e6c-8719-639f082cbe7f
      spec:
        deletionPolicy: Retain
        driver: ebs.csi.aws.com
        source:
          snapshotHandle: ""
        volumeSnapshotClassName: example-snapclass
        volumeSnapshotRef:
          kind: VolumeSnapshot
          name: velero-mysql-5q4v6
          namespace: mysql-persistent
          uid: 4d91f144-44f2-456e-a764-aeb2e12fc36f
      status:
        error:
          message: 'Failed to check and update snapshot content: failed to list snapshot
            for content velero-velero-mysql-5q4v6-rkp9h: "rpc error: code = Internal desc
            = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
            maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
            request id: 6c742d66-ba21-4524-a920-35d7068b85b7"'
          time: "2022-07-29T15:36:19Z"
        readyToUse: false
       

       

      [mperetz@mperetz oadp-e2e-qe]$ oc get pods -n mysql-persistent
      NAME                     READY   STATUS    RESTARTS   AGE
      mysql-65988b478c-kvrsn   0/1     Pending   0          31m
      [mperetz@mperetz oadp-e2e-qe]$ oc get pvc -n mysql-persistent
      NAME    STATUS    VOLUME   CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      mysql   Pending                                      gp2-csi        18m
      [mperetz@mperetz oadp-e2e-qe]$ 
       

       

      [mperetz@mperetz oadp-e2e-qe]$ velero restore get -n openshift-adp -o yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: "2022-07-29T15:13:43Z"
        generation: 9
        managedFields:
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              .: {}
              f:backupName: {}
              f:hooks: {}
              f:includedNamespaces: {}
              f:restorePVs: {}
            f:status: {}
          manager: velero
          operation: Update
          time: "2022-07-29T15:13:43Z"
        - apiVersion: velero.io/v1
          fieldsType: FieldsV1
          fieldsV1:
            f:spec:
              f:excludedResources: {}
            f:status:
              f:completionTimestamp: {}
              f:errors: {}
              f:phase: {}
              f:progress:
                .: {}
                f:itemsRestored: {}
                f:totalItems: {}
              f:startTimestamp: {}
              f:warnings: {}
          manager: velero-server
          operation: Update
          time: "2022-07-29T15:24:40Z"
        name: backup11-20220729151343
        namespace: openshift-adp
        resourceVersion: "208936"
        uid: 85163de2-e9a8-450c-b9da-432d2593d21a
      spec:
        backupName: backup11
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        hooks: {}
        includedNamespaces:
        - '*'
        restorePVs: true
      status:
        completionTimestamp: "2022-07-29T15:24:40Z"
        errors: 2
        phase: PartiallyFailed
        progress:
          itemsRestored: 36
          totalItems: 36
        startTimestamp: "2022-07-29T15:14:03Z"
        warnings: 8
       

       

      VSR:

      [mperetz@mperetz oadp-e2e-qe]$ oc get vsr -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: datamover.oadp.openshift.io/v1alpha1
        kind: VolumeSnapshotRestore
        metadata:
          creationTimestamp: "2022-07-29T15:14:30Z"
          generation: 1
          labels:
            velero.io/restore-name: backup11-20220729151343
          name: vsr-mysql
          namespace: mysql-persistent
          resourceVersion: "186253"
          uid: d78c1cc9-2fb2-46aa-b21b-049d236eca07
        spec:
          protectedNamespace: openshift-adp
          resticSecretRef:
            name: ts-dpa-1-volsync-restic
          volumeSnapshotMoverBackupRef:
            resticrepository: s3:s3.amazonaws.com/oadpbucket125675/openshift-adp/snapcontent-a82c0c9c-cefe-4dbe-b148-df9aa9d3fb3b-pvc
            sourcePVCData:
              name: mysql
              size: 2Gi
              storageClassName: gp2-csi
            volumeSnapshotClassName: example-snapclass
      kind: List
      metadata:
        resourceVersion: ""
        selfLink: ""
       

      ReplicationDestination:

      [mperetz@mperetz oadp-e2e-qe]$ oc get replicationdestination -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: volsync.backube/v1alpha1
        kind: ReplicationDestination
        metadata:
          creationTimestamp: "2022-07-29T15:13:18Z"
          generation: 1
          labels:
            datamover.oadp.openshift.io/vsr: vsr-mysql
          name: vsr-mysql-rep-dest
          namespace: openshift-adp
          resourceVersion: "183468"
          uid: 67861e07-8e8f-47d3-8dfa-442c3a05a5d9
        spec:
          restic:
            accessModes:
            - ReadWriteOnce
            capacity: 2Gi
            copyMethod: Snapshot
            repository: vsr-mysql-secret
            storageClassName: gp2-csi
            volumeSnapshotClassName: gp2-csi
          trigger:
            manual: vsr-mysql-trigger
        status:
          conditions:
          - lastTransitionTime: "2022-07-29T15:13:18Z"
            message: Synchronization in-progress
            reason: SyncInProgress
            status: "True"
            type: Synchronizing
          - lastTransitionTime: "2022-07-29T15:13:19Z"
            message: Reconcile complete
            reason: ReconcileComplete
            status: "True"
            type: Reconciled
          lastSyncStartTime: "2022-07-29T15:13:18Z"
      kind: List
      metadata:
        resourceVersion: ""
        selfLink: ""
       

      volumesnapshotcontents created on restore:

      [mperetz@mperetz oadp-e2e-qe]$ oc get volumesnapshotcontent velero-velero-mysql-5q4v6-gsf58 -o yaml
      apiVersion: snapshot.storage.k8s.io/v1
      kind: VolumeSnapshotContent
      metadata:
        annotations:
          snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes"
        creationTimestamp: "2022-07-29T15:46:14Z"
        finalizers:
        - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
        generateName: velero-velero-mysql-5q4v6-
        generation: 2
        labels:
          velero.io/restore-name: backup11-20220729154552
        name: velero-velero-mysql-5q4v6-gsf58
        resourceVersion: "288694"
        uid: fc3cb8db-702c-4ea0-82f3-b74f736f980a
      spec:
        deletionPolicy: Retain
        driver: ebs.csi.aws.com
        source:
          snapshotHandle: ""
        volumeSnapshotClassName: example-snapclass
        volumeSnapshotRef:
          kind: VolumeSnapshot
          name: velero-mysql-5q4v6
          namespace: mysql-persistent
          uid: 5f49eec8-d82e-4dc3-9655-a55121385df7
      status:
        error:
          message: 'Failed to check and update snapshot content: failed to list snapshot
            for content velero-velero-mysql-5q4v6-gsf58: "rpc error: code = Internal desc
            = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
            maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
            request id: aabce59e-2547-47a1-b9fb-89b1a8293dfc"'
          time: "2022-07-29T15:55:31Z"
        readyToUse: false
      [mperetz@mperetz oadp-e2e-qe]$ oc get volumesnapshotcontent  -o yaml
      apiVersion: v1
      items:
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshotContent
        metadata:
          annotations:
            snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes"
          creationTimestamp: "2022-07-29T15:46:14Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
          generateName: velero-velero-mysql-5q4v6-
          generation: 2
          labels:
            velero.io/restore-name: backup11-20220729154552
          name: velero-velero-mysql-5q4v6-gsf58
          resourceVersion: "289042"
          uid: fc3cb8db-702c-4ea0-82f3-b74f736f980a
        spec:
          deletionPolicy: Retain
          driver: ebs.csi.aws.com
          source:
            snapshotHandle: ""
          volumeSnapshotClassName: example-snapclass
          volumeSnapshotRef:
            kind: VolumeSnapshot
            name: velero-mysql-5q4v6
            namespace: mysql-persistent
            uid: 5f49eec8-d82e-4dc3-9655-a55121385df7
        status:
          error:
            message: 'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-5q4v6-gsf58: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: 9d7f2eec-e99e-46c6-a40e-5b51481aaa52"'
            time: "2022-07-29T15:55:39Z"
          readyToUse: false
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshotContent
        metadata:
          annotations:
            snapshot.storage.kubernetes.io/volumesnapshot-being-deleted: "yes"
          creationTimestamp: "2022-07-29T15:46:16Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshotcontent-bound-protection
          generateName: velero-velero-mysql-tmlk6-
          generation: 2
          labels:
            velero.io/restore-name: backup11-20220729154552
          name: velero-velero-mysql-tmlk6-tl6fd
          resourceVersion: "289030"
          uid: 289a9a25-baf5-4db4-9afe-60ca2bd32487
        spec:
          deletionPolicy: Retain
          driver: ebs.csi.aws.com
          source:
            snapshotHandle: ""
          volumeSnapshotClassName: example-snapclass
          volumeSnapshotRef:
            kind: VolumeSnapshot
            name: velero-mysql-tmlk6
            namespace: mysql-persistent
            uid: fd7e7d52-0237-4a25-8be9-917b10e7d6af
        status:
          error:
            message: 'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-tmlk6-tl6fd: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: ac2e86f0-8f56-4572-b81d-acb4d3db7df2"'
            time: "2022-07-29T15:55:39Z"
          readyToUse: false
      kind: List
      metadata:
        resourceVersion: ""
        selfLink: ""
       

       

      [mperetz@mperetz oadp-e2e-qe]$ oc get volumesnapshot -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshot
        metadata:
          annotations:
            velero.io/csi-driver-name: ebs.csi.aws.com
            velero.io/csi-volumesnapshot-handle: snap-03f2d51f00ed1cac4
            velero.io/csi-vsc-deletion-policy: Retain
            velero.io/vsi-volumesnapshot-restore-size: 2Gi
          creationTimestamp: "2022-07-29T15:57:12Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
          generation: 1
          labels:
            velero.io/backup-name: backup11
            velero.io/restore-name: backup11-20220729154647
          name: velero-mysql-5q4v6
          namespace: mysql-persistent
          resourceVersion: "293317"
          uid: 70122683-124b-4c1a-ac65-096b0796f41f
        spec:
          source:
            volumeSnapshotContentName: velero-velero-mysql-5q4v6-qmbp8
          volumeSnapshotClassName: example-snapclass
        status:
          boundVolumeSnapshotContentName: velero-velero-mysql-5q4v6-qmbp8
          error:
            message: 'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-5q4v6-qmbp8: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: e4639c97-5557-448e-952f-c69931a9b31e"'
            time: "2022-07-29T15:57:17Z"
          readyToUse: false
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshot
        metadata:
          annotations:
            velero.io/csi-driver-name: ebs.csi.aws.com
            velero.io/csi-volumesnapshot-handle: snap-00f64b33c28e4c73a
            velero.io/csi-vsc-deletion-policy: Retain
            velero.io/vsi-volumesnapshot-restore-size: 2Gi
          creationTimestamp: "2022-07-29T15:57:15Z"
          finalizers:
          - snapshot.storage.kubernetes.io/volumesnapshot-as-source-protection
          generation: 1
          labels:
            velero.io/backup-name: backup11
            velero.io/restore-name: backup11-20220729154647
          name: velero-mysql-tmlk6
          namespace: mysql-persistent
          resourceVersion: "293288"
          uid: 57a52438-916b-44de-9d28-8617a9cede13
        spec:
          source:
            volumeSnapshotContentName: velero-velero-mysql-tmlk6-rvbx2
          volumeSnapshotClassName: example-snapclass
        status:
          boundVolumeSnapshotContentName: velero-velero-mysql-tmlk6-rvbx2
          error:
            message: 'Failed to check and update snapshot content: failed to list snapshot
              for content velero-velero-mysql-tmlk6-rvbx2: "rpc error: code = Internal desc
              = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter
              maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400,
              request id: c72f7514-3cc5-4c96-9b0a-e2e55d407893"'
            time: "2022-07-29T15:57:16Z"
          readyToUse: false
      - apiVersion: snapshot.storage.k8s.io/v1
        kind: VolumeSnapshot
        metadata:
          creationTimestamp: "2022-07-29T15:41:23Z"
          generation: 1
          name: volsync-vsr-mysql-rep-dest-dest-20220729151344
          namespace: openshift-adp
          ownerReferences:
          - apiVersion: volsync.backube/v1alpha1
            blockOwnerDeletion: true
            controller: true
            kind: ReplicationDestination
            name: vsr-mysql-rep-dest
            uid: 67861e07-8e8f-47d3-8dfa-442c3a05a5d9
          resourceVersion: "254288"
          uid: ebecf384-c659-482c-9b0a-1ebcf0e967f9
        spec:
          source:
            persistentVolumeClaimName: volsync-vsr-mysql-rep-dest-dest
          volumeSnapshotClassName: gp2-csi
        status:
          error:
            message: Failed to get snapshot class with error volumesnapshotclass.snapshot.storage.k8s.io
              "gp2-csi" not found
            time: "2022-07-29T15:41:23Z"
      kind: List
      metadata:
        resourceVersion: ""
        selfLink: ""
      [mperetz@mperetz oadp-e2e-qe]$ oc get storageclass
      NAME                PROVISIONER             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
      gp2                 kubernetes.io/aws-ebs   Delete          WaitForFirstConsumer   true                   153m
      gp2-csi (default)   ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   153m
      gp3-csi             ebs.csi.aws.com         Delete          WaitForFirstConsumer   true                   153m
      [mperetz@mperetz oadp-e2e-qe]$ 
       

       

      Expected results:

       

      Additional info:

              spampatt@redhat.com Shubham Pampattiwar
              mperetz@redhat.com Maya Peretz
              Maya Peretz Maya Peretz
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: