-
Bug
-
Resolution: Done
-
Minor
-
OADP 1.1.1
-
-
False
-
ToDo
-
-
-
0
-
0
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Description of problem:
Restore partiallyFailed after 14 success restores.
Error which found in restore:
error preparing volumesnapshotclasses.snapshot.storage.k8s.io/test-849-snapclass: rpc error: code = Unknown desc = timed out waiting for the condition
Error which found in VS and VSContent:
error: message: 'Failed to check and update snapshot content: failed to list snapshot for content velero-velero-cassandra-data-cassandra-2-pr2xh-5vgqm: "rpc error: code = Internal desc = Could not list snapshots: InvalidParameterValue: Value ( 0 ) for parameter maxResults is invalid. Expecting a value greater than 5.\n\tstatus code: 400, request id: 13913292-0cd1-49f6-86be-d4d2bba20aa6"'
Here is the DPA:
apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: creationTimestamp: "2022-11-09T08:47:38Z" generation: 1 name: ts-dpa namespace: openshift-adp resourceVersion: "95617" uid: 4b3895ab-ae60-46dd-b8d3-37e4dc42bd96 spec: backupLocations: - velero: config: region: us-east-2 credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: oadpbucket154245 prefix: velero-e2e-22cc15a0-600b-11ed-a776-5405db5be9ea provider: aws configuration: restic: enable: true podConfig: resourceAllocations: {} velero: defaultPlugins: - openshift - aws - kubevirt - csi features: dataMover: enable: true podDnsConfig: {} snapshotLocations: [] status: conditions: - lastTransitionTime: "2022-11-09T08:47:38Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled
Here are some errors from Velero:
oc logs deploy/velero -n openshift-adp | grep error Defaulted container "velero" out of: velero, openshift-velero-plugin (init), velero-plugin-for-aws (init), kubevirt-velero-plugin (init), velero-plugin-for-csi (init) time="2022-11-09T08:48:11Z" level=error msg="Current BackupStorageLocations available/unavailable/unknown: 0/0/1)" controller=backup-storage-location logSource="/remote-source/velero/app/pkg/controller/backup_storage_location_controller.go:173" time="2022-11-09T11:03:34Z" level=error msg="Timed out awaiting reconciliation of volumesnapshotrestore cassandra-ns/vsr-79txk" cmd=/plugins/velero-plugin-for-csi logSource="/remote-source/app/internal/util/util.go:498" pluginName=velero-plugin-for-csi restore=openshift-adp/test-849-dzjhq time="2022-11-09T11:03:34Z" level=error msg="Timed out awaiting reconciliation of volumesnapshotrestore cassandra-ns/vsr-mxwrl" cmd=/plugins/velero-plugin-for-csi logSource="/remote-source/app/internal/util/util.go:498" pluginName=velero-plugin-for-csi restore=openshift-adp/test-849-dzjhq time="2022-11-09T11:03:34Z" level=error msg="failed to wait for VolumeSnapshotRestores to be completed: timed out waiting for the condition" cmd=/plugins/velero-plugin-for-csi logSource="/remote-source/app/internal/util/util.go:531" pluginName=velero-plugin-for-csi restore=openshift-adp/test-849-dzjhq time="2022-11-09T11:03:43Z" level=error msg="Cluster resource restore error: error preparing volumesnapshotclasses.snapshot.storage.k8s.io/test-849-snapclass: rpc error: code = Unknown desc = timed out waiting for the condition" logSource="/remote-source/velero/app/pkg/controller/restore_controller.go:500" restore=openshift-adp/test-849-dzjhq time="2022-11-09T11:57:38Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-08724d8a-62ad-47e3-9759-618e94d99761 error="downloadrequests.velero.io \"test-849-dzjhq-08724d8a-62ad-47e3-9759-618e94d99761\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74" time="2022-11-09T11:59:30Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-83a362bd-4f58-471c-9c47-00783a03eec9 error="downloadrequests.velero.io \"test-849-dzjhq-83a362bd-4f58-471c-9c47-00783a03eec9\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74" time="2022-11-09T11:59:45Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-1f971b8e-415d-4de3-a024-a2b98f730fe1 error="downloadrequests.velero.io \"test-849-dzjhq-1f971b8e-415d-4de3-a024-a2b98f730fe1\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74" time="2022-11-09T12:02:39Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-fa0fa64a-67fb-46ec-9d51-6842720fd257 error="downloadrequests.velero.io \"test-849-dzjhq-fa0fa64a-67fb-46ec-9d51-6842720fd257\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74" time="2022-11-09T12:03:23Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-bf5f52ac-c375-4da9-aebc-483fec8bab0e error="downloadrequests.velero.io \"test-849-dzjhq-bf5f52ac-c375-4da9-aebc-483fec8bab0e\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74" time="2022-11-09T12:15:01Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/test-849-dzjhq-ccc4c57c-5e72-423d-8fc4-f51ae430b5fc error="downloadrequests.velero.io \"test-849-dzjhq-ccc4c57c-5e72-423d-8fc4-f51ae430b5fc\" not found" logSource="/remote-source/velero/app/pkg/controller/download_request_controller.go:74"
I attached must-gather for further investigation.
Version-Release number of selected component (if applicable):
OADP 1.1.1 Bundle: 1.1.1-39
Volsync 0.5.1 Red Hat build
How reproducible:
Re run DataMover backup and restore till the restore partiallyFailed (14 times in this bug).
Steps to Reproduce:
1. Create a DataMover backup.
2. Delete namespace and trigger restore multiple times.
Actual results:
Restore partiallyFailed after some success restores.
Expected results: