-
Bug
-
Resolution: Unresolved
-
Normal
-
OADP 1.3.0
-
4
-
False
-
-
False
-
ToDo
-
-
-
0
-
0.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Description of problem:
Deployed a stateful application with multiple PVC's. Bounced velero pod after PodVolumeRestore CRs are created, PodvolumeRestore is still continuing the data transfer even though the restore is marked as failed.
Version-Release number of selected component (if applicable):
OADP 1.3.0 - 138
How reproducible:
Always
Steps to Reproduce:
1. Deployed a stateful application
$ oc get pod -n ocp-8pvc-app NAME READY STATUS RESTARTS AGE mysql-66865fdf8c-96pfj 1/1 Running 0 46m $ oc get pvc -n ocp-8pvc-app NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE volume1 Bound pvc-05aaa9c4-79d3-413b-8ddf-374a6cf30f0b 1Gi RWO gp3-csi 114s volume2 Bound pvc-9f86600b-5c67-4034-833a-4d8f3f5ab2e3 1Gi RWO gp3-csi 114s volume3 Bound pvc-5122c09a-5e51-41b8-9249-fc80adf4a1c2 1Gi RWO gp3-csi 114s volume4 Bound pvc-b5c8239e-b43a-4159-9370-b925373a37bc 1Gi RWO gp3-csi 114s volume5 Bound pvc-57aa8a7d-aca9-4d67-a878-3e476557ea1b 1Gi RWO gp3-csi 114s volume6 Bound pvc-175204ab-2b5a-4632-abb1-f33013b6d309 1Gi RWO gp3-csi 114s volume7 Bound pvc-a24a7aa2-da3e-46e9-986d-d28957feb74b 1Gi RWO gp3-csi 114s volume8 Bound pvc-7176f1b7-099c-4e2b-b3c7-bf4dd602aeeb 1Gi RWO gp3-csi 113s
2. Create a FSB backup with restic/kopia
$ oc get backup test-backup1 -o yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/resource-timeout: 10m0s velero.io/source-cluster-k8s-gitversion: v1.26.9+636f2be velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "26" creationTimestamp: "2023-11-03T11:28:18Z" generation: 7 labels: velero.io/storage-location: default name: test-backup1 namespace: openshift-adp resourceVersion: "160865" uid: 6c0f06cc-fc79-454a-9a8c-d317513de50c spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: true includedNamespaces: - ocp-8pvc-app itemOperationTimeout: 4h0m0s snapshotMoveData: false storageLocation: default ttl: 720h0m0s status: completionTimestamp: "2023-11-03T11:35:46Z" expiration: "2023-12-03T11:28:18Z" formatVersion: 1.1.0 phase: Completed progress: itemsBackedUp: 87 totalItems: 87 startTimestamp: "2023-11-03T11:28:18Z" version: 1
3. Remove the app namespace and create a restore.
$ oc delete ns ocp-8pvc-app
4. Wait until PodVolumeRestore gets created
$ oc get podvolumerestore -w NAME NAMESPACE POD UPLOADER TYPE VOLUME STATUS TOTALBYTES BYTESDONE AGE test-restore1-6vxk6 ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume5 0s test-restore1-r7zzr ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume8 0s test-restore1-pgzbp ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume4 0s test-restore1-5m56l ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume1 0s test-restore1-96chn ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume2 0s test-restore1-rvw2s ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume6 0s test-restore1-wjt8q ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume7 0s test-restore1-x58h5 ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume3 0s
5. Bounce velero pod as soon as podvolumerestore CRs are created
$ oc delete pod velero-6d4b46949b-tb57s --force
6. Verify the restore is Failed
$ oc get restore test-restore1 -o yaml apiVersion: velero.io/v1 kind: Restore metadata: creationTimestamp: "2023-11-03T11:41:30Z" finalizers: - restores.velero.io/external-resources-finalizer generation: 5 name: test-restore1 namespace: openshift-adp resourceVersion: "164684" uid: d853eece-e593-4d9a-ba34-7d8d21adef78 spec: backupName: test-backup1 excludedResources: - nodes - events - events.events.k8s.io - backups.velero.io - restores.velero.io - resticrepositories.velero.io - csinodes.storage.k8s.io - volumeattachments.storage.k8s.io - backuprepositories.velero.io itemOperationTimeout: 4h0m0s status: completionTimestamp: "2023-11-03T11:41:57Z" failureReason: found a restore with status "InProgress" during the server starting, mark it as "Failed" phase: Failed progress: itemsRestored: 39 totalItems: 39 startTimestamp: "2023-11-03T11:41:30Z"
Actual results:
PodVolumeRestore is completed for all the volumes.
$ oc get podvolumerestore NAME NAMESPACE POD UPLOADER TYPE VOLUME STATUS TOTALBYTES BYTESDONE AGE test-restore1-5m56l ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume1 Completed 104857654 104857654 10m test-restore1-6vxk6 ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume5 Completed 104857654 104857654 10m test-restore1-96chn ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume2 Completed 104857654 104857654 10m test-restore1-pgzbp ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume4 Completed 104857654 104857654 10m test-restore1-r7zzr ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume8 Completed 104857654 104857654 10m test-restore1-rvw2s ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume6 Completed 104857654 104857654 10m test-restore1-wjt8q ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume7 Completed 104857654 104857654 10m test-restore1-x58h5 ocp-8pvc-app app-with-8pvc-577f9cd7db-qk2b7 restic volume3 Completed 104857654 104857654 10m
Expected results:
PodVolumeRestore shouldn't progress after restore is marked as failed.
Additional info: