-
Bug
-
Resolution: Unresolved
-
Minor
-
OADP 1.3.3, OADP 1.4.1
-
3
-
False
-
-
False
-
ToDo
-
-
-
0
-
0.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
None
Description of problem:
Tried running backup when BSL was in unavailable state. I noticed Velero processed entire backup request and lastly marked it as failed.
Slack discussion:-
https://redhat-internal.slack.com/archives/C0144ECKUJ0/p1730725233888819
Version-Release number of selected component (if applicable):
OADP 1.4.1
How reproducible:
Always
Steps to Reproduce:
1. Create a DPA with non existing bucket.
$ oc get dpa ts-dpa -o yaml apiVersion: oadp.openshift.io/v1alpha1 kind: DataProtectionApplication metadata: creationTimestamp: "2024-11-06T11:02:12Z" generation: 2 name: ts-dpa namespace: openshift-adp resourceVersion: "13822947" uid: d36fa2b9-20b6-457d-b0f0-1d4344c548f6 spec: backupLocations: - name: default velero: config: profile: noobaa region: noobaa s3ForcePathStyle: "true" s3Url: http://s3-openshift-storage.apps.mig-oadp-2316d.rhos-psi.cnv-qe.rhood.us credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: migoadp2316djgp7g1 prefix: velero provider: aws configuration: nodeAgent: enable: true uploaderType: kopia velero: defaultPlugins: - openshift - aws - csi status: conditions: - lastTransitionTime: "2024-11-06T11:02:13Z" message: Reconcile complete reason: Complete status: "True" type: Reconciled
2. Verify the BSL is in unavailable state
$ oc get bsl default -o yaml apiVersion: velero.io/v1 kind: BackupStorageLocation metadata: creationTimestamp: "2024-11-06T11:02:12Z" generation: 18 labels: app.kubernetes.io/component: bsl app.kubernetes.io/instance: default app.kubernetes.io/managed-by: oadp-operator app.kubernetes.io/name: oadp-operator-velero openshift.io/oadp: "True" openshift.io/oadp-registry: "True" name: default namespace: openshift-adp ownerReferences: - apiVersion: oadp.openshift.io/v1alpha1 blockOwnerDeletion: true controller: true kind: DataProtectionApplication name: ts-dpa uid: d36fa2b9-20b6-457d-b0f0-1d4344c548f6 resourceVersion: "13828551" uid: bfb24f78-f433-4122-8b06-add2683d0128 spec: config: checksumAlgorithm: "" profile: noobaa region: noobaa s3ForcePathStyle: "true" s3Url: http://s3-openshift-storage.apps.mig-oadp-2316d.rhos-psi.cnv-qe.rhood.us credential: key: cloud name: cloud-credentials default: true objectStorage: bucket: migoadp2316djgp7g1 prefix: velero provider: aws status: lastValidationTime: "2024-11-06T11:16:16Z" message: 'BackupStorageLocation "default" is unavailable: rpc error: code = Unknown desc = operation error S3: ListObjectsV2, https response error StatusCode: 404, RequestID: m35s9up6-eyg2so-w7j, HostID: m35s9up6-eyg2so-w7j, NoSuchBucket: ' phase: Unavailable
3. Deploy an application and trigger a backup
Actual results:
From velero logs and status it seems that it processed the backup request. Refer to status.progress.itemsBackedUp field.
$ oc get backup test-backup1 -o yaml apiVersion: velero.io/v1 kind: Backup metadata: annotations: velero.io/resource-timeout: 10m0s velero.io/source-cluster-k8s-gitversion: v1.29.8+632b078 velero.io/source-cluster-k8s-major-version: "1" velero.io/source-cluster-k8s-minor-version: "29" creationTimestamp: "2024-11-06T11:12:35Z" generation: 5 labels: velero.io/storage-location: default name: test-backup1 namespace: openshift-adp resourceVersion: "13826552" uid: f45e9d8c-1a59-48f3-b732-afbe756e9571 spec: csiSnapshotTimeout: 10m0s defaultVolumesToFsBackup: false includedNamespaces: - ocp-django itemOperationTimeout: 4h0m0s snapshotMoveData: false storageLocation: default ttl: 720h0m0s status: errors: 2 expiration: "2024-12-06T11:12:35Z" failureReason: 'rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/velero-backup.json: operation error S3: PutObject, https response error StatusCode: 404, RequestID: m35s5565-38lqzg-vds, HostID: m35s5565-38lqzg-vds, api error NoSuchBucket: The specified bucket does not exist.' formatVersion: 1.1.0 hookStatus: {} phase: Failed progress: itemsBackedUp: 62 totalItems: 62 startTimestamp: "2024-11-06T11:12:35Z" version: 1 warnings: 1
Velero logs:-
time="2024-11-06T11:12:37Z" level=info msg="Backed up a total of 62 items" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/backup/backup.go:499" progress= time="2024-11-06T11:12:37Z" level=info msg="Setting up backup store to persist the backup" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:738" time="2024-11-06T11:12:37Z" level=error msg="Error uploading log file" backup=test-backup1 bucket=migoadp2316djgp7g1 error="rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/test-backup1-logs.gz: operation error S3: PutObject, https response error StatusCode: 404, RequestID: m35s555m-2x69zr-18z6, HostID: m35s555m-2x69zr-18z6, api error NoSuchBucket: The specified bucket does not exist." error.file="/remote-source/app/velero-plugin-for-aws/object_store.go:279" error.function="main.(*ObjectStore).PutObject" logSource="/remote-source/velero/app/pkg/persistence/object_store.go:256" prefix=velero time="2024-11-06T11:12:37Z" level=info msg="Initial backup processing complete, moving to FinalizingPartiallyFailed" backup=openshift-adp/test-backup1 logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:752" time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/velero id=174 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/velero time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/plugins/velero-plugins id=185 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugins time="2024-11-06T11:12:37Z" level=info msg="plugin process exited" backup=openshift-adp/test-backup1 cmd=/plugins/velero-plugin-for-aws id=196 logSource="/remote-source/velero/app/pkg/plugin/clientmgmt/process/logrus_adapter.go:80" plugin=/plugins/velero-plugin-for-aws time="2024-11-06T11:12:37Z" level=error msg="backup failed" backuprequest=openshift-adp/test-backup1 controller=backup error="rpc error: code = Unknown desc = error putting object velero/backups/test-backup1/velero-backup.json: operation error S3: PutObject, https response error StatusCode: 404, RequestID: m35s5565-38lqzg-vds, HostID: m35s5565-38lqzg-vds, api error NoSuchBucket: The specified bucket does not exist." logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:306" time="2024-11-06T11:12:37Z" level=info msg="Updating backup's final status" backuprequest=openshift-adp/test-backup1 controller=backup logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:325"
Expected results:
Velero should exist early if BSL is in unavailable state.
Additional info: