Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2064

Restic continue to backup although timeout reached

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • OADP 1.2.6
    • None
    • restic
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • Important
    • 8
    • 8.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      Restic timeout set to 15hrs (DPA). While running a backup and the timeout is reached, the backup CR  status is 'PartiallyFailed' and the backup CR describe: 

      "error: /timed out waiting for all PodVolumeBackups to complete".

      But the data was still uploaded to the S3 bucket till finished.
      Total backup time ~18 hours

      Version-Release number of selected component (if applicable):

      OCP 4.1.2.9

      ODF 4.12.3
      OADP 1.2.0-79 (iib 504285)
      Ceph-RBD

      How reproducible:

       

      Steps to Reproduce:
      1. Create large PV with data (tested with 3T usage, 4T PV size
      2.  Set restic timeout in DPA
      3. Run backup
      4. Backup failed on timeout
      5. Monitor restic pod log and S3 bucket - Data still upload

      Actual results:

      Backup failed, data still continue to upload

      Expected results:

      Backup failed, data should be stopped to upload

      Additional info:

      Attached node-agent graphs that show the upload continues after timeout is reached (Total 18Hrs)
      (Note: 3Hrs gap between the backup CR to graphs)

      DPA Configuration
        configuration:
          restic:
            enable: true
            podConfig:
              resourceAllocations:
                limits:
                  cpu: 2
                  memory: 32768Mi
                requests:
                  cpu: 1
                  memory: 16384Mi
            timeout: 900m

      Backup CR - 15 hours timeout
      status:
        completionTimestamp: "2023-06-06T10:04:16Z"
        errors: 1
        expiration: "2023-07-05T19:04:15Z"
        formatVersion: 1.1.0
        phase: PartiallyFailed
        startTimestamp: "2023-06-05T19:04:15Z"

      {+}Restic.log
      time="2023-06-05T19:04:24Z" level=info msg="PodVolumeBackup starting" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:92" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd

      {+}time="2023-06-05T19:04:25Z" level=info msg="Looking for most recent completed PodVolumeBackup for this PVC" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:229" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd pvcUID=998bd2ce-810d-450d-8e76-1bface7feb76
      time="2023-06-05T19:04:25Z" level=info msg="No completed PodVolumeBackup found for PVC" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:270" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd pvcUID=998bd2ce-810d-450d-8e76-1bface7feb76
      time="2023-06-05T19:04:25Z" level=info msg="No parent snapshot found for PVC, not based on parent snapshot for this backup" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:174" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
      time="2023-06-06T13:11:28Z" level=info msg="Run command=restic backup --repo=s3:https://s3-openshift-storage.apps.vlan611.rdu2.scalelab.redhat.com/oadp-bucket/velero/restic/perf-busy-data-cephrbd-1pod-3t --password-file=/tmp/credentials/openshift-adp/velero-repo-credentials-repository-password --cache-dir=/scratch/.cache/restic . --tag=backup=restic-backup-rbd-1pod-3t --tag=backup-uid=16f043a4-cc6d-4bdf-adc6-f9fa9a51cc70 --tag=ns=perf-busy-data-cephrbd-1pod-3t --tag=pod=busy-data-rbd-1pod-3t-1-7bc4b4f6c7-9mtq6 --tag=pod-uid=87d830c1-9cff-4152-bd77-3f3be513df37 --tag=pvc-uid=998bd2ce-810d-450d-8e76-1bface7feb76 --tag=volume=vol-0 --host=velero --json --insecure-tls=true, stdout={\"message_type\":\"summary\",\"files_new\":10530001,\"files_changed\":0,\"files_unmodified\":0,\"dirs_new\":302,\"dirs_changed\":0,\"dirs_unmodified\":0,\"data_blobs\":10530001,\"tree_blobs\":303,\"data_added\":3239911610014,\"total_files_processed\":10530001,\"total_bytes_processed\":3234816000359,\"total_duration\":65222.014351503,\"snapshot_id\":\"0cfdb1e44b9eece89b4a7c3616395c5b5d7225c6c6d4ea1b8a90c63d985f6b2f\"}, stderr=" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/uploader/provider/restic.go:157" parentSnapshot= path="/host_pods/87d830c1-9cff-4152-bd77-3f3be513df37/volumes/kubernetes.io~csi/pvc-998bd2ce-810d-450d-8e76-1bface7feb76/mount" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
      time="2023-06-06T13:11:28Z" level=info msg="PodVolumeBackup completed" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:213" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd
      time="2023-06-06T13:11:28Z" level=info msg="PodVolumeBackup starting" backup=openshift-adp/restic-backup-rbd-1pod-3t controller=podvolumebackup logSource="/remote-source/velero/app/pkg/controller/pod_volume_backup_controller.go:92" podvolumebackup=openshift-adp/restic-backup-rbd-1pod-3t-8bpwd{+}
      {+}

       

       

        1. ResticBackupTimeout.tar
          5.58 MB
        2. restic_backup_network.jpeg
          restic_backup_network.jpeg
          91 kB
        3. restic_backup_memory.jpeg
          restic_backup_memory.jpeg
          63 kB
        4. restic_backup_cpu.jpeg
          restic_backup_cpu.jpeg
          95 kB

              wnstb Wes Hayutin
              dvaanunu@redhat.com David Vaanunu
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: