Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-3039

PodVolumeBackup/Restore CR status not marked as failed after backup/restore is failed

XMLWordPrintable

    • 4
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      Deployed a stateful application with multiple PVC's. Bounced velero pod after PodVolumeRestore CRs are created, PodvolumeRestore is still continuing the data transfer even though the restore is marked as failed. 

       

      Version-Release number of selected component (if applicable):
      OADP 1.3.0 - 138

       

      How reproducible:
      Always 

       

      Steps to Reproduce:
      1. Deployed a stateful application

      $ oc get pod -n ocp-8pvc-app
      NAME                     READY   STATUS    RESTARTS   AGE
      mysql-66865fdf8c-96pfj   1/1     Running   0          46m
      
      $  oc get pvc -n ocp-8pvc-app
      NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
      volume1   Bound    pvc-05aaa9c4-79d3-413b-8ddf-374a6cf30f0b   1Gi        RWO            gp3-csi        114s
      volume2   Bound    pvc-9f86600b-5c67-4034-833a-4d8f3f5ab2e3   1Gi        RWO            gp3-csi        114s
      volume3   Bound    pvc-5122c09a-5e51-41b8-9249-fc80adf4a1c2   1Gi        RWO            gp3-csi        114s
      volume4   Bound    pvc-b5c8239e-b43a-4159-9370-b925373a37bc   1Gi        RWO            gp3-csi        114s
      volume5   Bound    pvc-57aa8a7d-aca9-4d67-a878-3e476557ea1b   1Gi        RWO            gp3-csi        114s
      volume6   Bound    pvc-175204ab-2b5a-4632-abb1-f33013b6d309   1Gi        RWO            gp3-csi        114s
      volume7   Bound    pvc-a24a7aa2-da3e-46e9-986d-d28957feb74b   1Gi        RWO            gp3-csi        114s
      volume8   Bound    pvc-7176f1b7-099c-4e2b-b3c7-bf4dd602aeeb   1Gi        RWO            gp3-csi        113s
      

      2. Create a FSB backup with restic/kopia

      $ oc get backup test-backup1 -o yaml
      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        annotations:
          velero.io/resource-timeout: 10m0s
          velero.io/source-cluster-k8s-gitversion: v1.26.9+636f2be
          velero.io/source-cluster-k8s-major-version: "1"
          velero.io/source-cluster-k8s-minor-version: "26"
        creationTimestamp: "2023-11-03T11:28:18Z"
        generation: 7
        labels:
          velero.io/storage-location: default
        name: test-backup1
        namespace: openshift-adp
        resourceVersion: "160865"
        uid: 6c0f06cc-fc79-454a-9a8c-d317513de50c
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: true
        includedNamespaces:
        - ocp-8pvc-app
        itemOperationTimeout: 4h0m0s
        snapshotMoveData: false
        storageLocation: default
        ttl: 720h0m0s
      status:
        completionTimestamp: "2023-11-03T11:35:46Z"
        expiration: "2023-12-03T11:28:18Z"
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 87
          totalItems: 87
        startTimestamp: "2023-11-03T11:28:18Z"
        version: 1
      

      3. Remove the app namespace and create a restore.

      $ oc delete ns ocp-8pvc-app

      4. Wait until PodVolumeRestore gets created

      $ oc get podvolumerestore -w
      NAME                  NAMESPACE      POD                              UPLOADER TYPE   VOLUME    STATUS   TOTALBYTES   BYTESDONE   AGE
      test-restore1-6vxk6   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume5                                     0s
      test-restore1-r7zzr   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume8                                     0s
      test-restore1-pgzbp   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume4                                     0s
      test-restore1-5m56l   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume1                                     0s
      test-restore1-96chn   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume2                                     0s
      test-restore1-rvw2s   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume6                                     0s
      test-restore1-wjt8q   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume7                                     0s
      test-restore1-x58h5   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume3                                     0s

      5. Bounce velero pod as soon as podvolumerestore CRs are created

      $ oc delete pod velero-6d4b46949b-tb57s --force 
      

      6. Verify the restore is Failed

      $ oc get restore test-restore1 -o yaml
      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: "2023-11-03T11:41:30Z"
        finalizers:
        - restores.velero.io/external-resources-finalizer
        generation: 5
        name: test-restore1
        namespace: openshift-adp
        resourceVersion: "164684"
        uid: d853eece-e593-4d9a-ba34-7d8d21adef78
      spec:
        backupName: test-backup1
        excludedResources:
        - nodes
        - events
        - events.events.k8s.io
        - backups.velero.io
        - restores.velero.io
        - resticrepositories.velero.io
        - csinodes.storage.k8s.io
        - volumeattachments.storage.k8s.io
        - backuprepositories.velero.io
        itemOperationTimeout: 4h0m0s
      status:
        completionTimestamp: "2023-11-03T11:41:57Z"
        failureReason: found a restore with status "InProgress" during the server starting,
          mark it as "Failed"
        phase: Failed
        progress:
          itemsRestored: 39
          totalItems: 39
        startTimestamp: "2023-11-03T11:41:30Z" 

      Actual results:

      PodVolumeRestore is completed for all the volumes. 

      $ oc get podvolumerestore 
      NAME                  NAMESPACE      POD                              UPLOADER TYPE   VOLUME    STATUS      TOTALBYTES   BYTESDONE   AGE
      test-restore1-5m56l   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume1   Completed   104857654    104857654   10m
      test-restore1-6vxk6   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume5   Completed   104857654    104857654   10m
      test-restore1-96chn   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume2   Completed   104857654    104857654   10m
      test-restore1-pgzbp   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume4   Completed   104857654    104857654   10m
      test-restore1-r7zzr   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume8   Completed   104857654    104857654   10m
      test-restore1-rvw2s   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume6   Completed   104857654    104857654   10m
      test-restore1-wjt8q   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume7   Completed   104857654    104857654   10m
      test-restore1-x58h5   ocp-8pvc-app   app-with-8pvc-577f9cd7db-qk2b7   restic          volume3   Completed   104857654    104857654   10m

      Expected results: 
      PodVolumeRestore shouldn't progress after restore is marked as failed. 

       

      Additional info:

              wnstb Wes Hayutin
              rhn-support-prajoshi Prasad Joshi
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: