-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
1
-
False
-
-
False
-
ToDo
-
-
-
Important
-
8
-
8.000
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Description of problem:
Restic timeout set to 15hrs (DPA). While running a backup and the timeout is reached, the backup CR status is 'PartiallyFailed' and the backup CR describe:
"error: /timed out waiting for all PodVolumeBackups to complete".
But the data was still uploaded to the S3 bucket till finished.
Total backup time ~18 hours
Version-Release number of selected component (if applicable):
OCP 4.1.2.9
ODF 4.12.3
OADP 1.2.0-79 (iib 504285)
Ceph-RBD
How reproducible:
Steps to Reproduce:
1. Create large PV with data (tested with 3T usage, 4T PV size
2. Set restic timeout in DPA
3. Run backup
4. Backup failed on timeout
5. Monitor restic pod log and S3 bucket - Data still upload
Actual results:
Backup failed, data still continue to upload
Expected results:
Backup failed, data should be stopped to upload
Additional info:
Attached node-agent graphs that show the upload continues after timeout is reached (Total 18Hrs)
(Note: 3Hrs gap between the backup CR to graphs)
DPA Configuration
configuration:
restic:
enable: true
podConfig:
resourceAllocations:
limits:
cpu: 2
memory: 32768Mi
requests:
cpu: 1
memory: 16384Mi
timeout: 900m
Backup CR - 15 hours timeout
status:
completionTimestamp: "2023-06-06T10:04:16Z"
errors: 1
expiration: "2023-07-05T19:04:15Z"
formatVersion: 1.1.0
phase: PartiallyFailed
startTimestamp: "2023-06-05T19:04:15Z"
{+}Restic.log
time="2023-06-05T19:04:24Z" level=info msg="PodVolumeBackup starting" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:92" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
{+}time="2023-06-05T19:04:25Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:229" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd pvcUID=998bd2ce-810d-450d-8e76-1bface7feb76
time="2023-06-05T19:04:25Z" level=info msg="No completed PodVolumeBackup found for PVC" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:270" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd pvcUID=998bd2ce-810d-450d-8e76-1bface7feb76
time="2023-06-05T19:04:25Z" level=info msg="No parent snapshot found for PVC, not based on parent snapshot for this backup" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:174" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
time="2023-06-06T13:11:28Z" level=info msg="Run command=restic backup --repo=s3:https://s3-openshift-storage.apps.vlan611.rdu2.scalelab.redhat.com/oadp-bucket/velero/restic/perf-busy-data-cephrbd-1pod-3t --password-file=/tmp/credentials/openshift-adp/velero-repo-credentials-repository-password --cache-dir=/scratch/.cache/restic . --tag=backup=restic-backup-rbd-1pod-3t --tag=backup-uid=16f043a4-cc6d-4bdf-adc6-f9fa9a51cc70 --tag=ns=perf-busy-data-cephrbd-1pod-3t --tag=pod=busy-data-rbd-1pod-3t-1-7bc4b4f6c7-9mtq6 --tag=pod-uid=87d830c1-9cff-4152-bd77-3f3be513df37 --tag=pvc-uid=998bd2ce-810d-450d-8e76-1bface7feb76 --tag=volume=vol-0 --host=velero --json --insecure-tls=true, stdout={\"message_type\":\"summary\",\"files_new\":10530001,\"files_changed\":0,\"files_unmodified\":0,\"dirs_new\":302,\"dirs_changed\":0,\"dirs_unmodified\":0,\"data_blobs\":10530001,\"tree_blobs\":303,\"data_added\":3239911610014,\"total_files_processed\":10530001,\"total_bytes_processed\":3234816000359,\"total_duration\":65222.014351503,\"snapshot_id\":\"0cfdb1e44b9eece89b4a7c3616395c5b5d7225c6c6d4ea1b8a90c63d985f6b2f\"}, stderr=" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/uploader/provider/restic.go:157" parentSnapshot= path="/host_pods/87d830c1-9cff-4152-bd77-3f3be513df37/volumes/kubernetes.io~csi/pvc-998bd2ce-810d-450d-8e76-1bface7feb76/mount" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
time="2023-06-06T13:11:28Z" level=info msg="PodVolumeBackup completed" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:213" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
time="2023-06-06T13:11:28Z" level=info msg="PodVolumeBackup starting" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:92" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd{+}
{+}