Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-713

Datamover poor performance - backup of 20pods failed after timeout of 2.5Hrs

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • No
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • Proposed

      Description of problem:

      datamover failed with timeout after 2.5hrs when backup 20pods in single-namespace

      backup status - '"PartiallyFailed'

      vsb pods stay with 'running' status

       Releated to OADP-644 VolumeSnapshotBackup and VolumeSnapshotRestore timeouts should be configurable

      Version-Release number of selected component (if applicable):

      OCP - 4.10.21

      OADP-1.1.0-59 (iib 289368)

      How reproducible:

       

      Steps to Reproduce:
      1. enable datamover in dpa
      2. create namespace with a few pods (20)
      3. Start backup

      Actual results:

      backup status 'PartiallyFailed'

      Expected results:

      backup status 'Completed'

      Additional info:

       

      Velero log:
      time="2022-08-10T13:28:34Z" level=error msg="Error backing up item" backup=openshift-adp/datamover-csi-ocs-cephrbd-20pods error="error executing custom action (groupResource=volumesnapshotbackups.datamover.oadp.openshift.io, namespace=busybox-perf-single-ns-20-pods, name=vsb-hltx7): rpc error: code = Unknown desc = timed out waiting for the condition" logSource="pkg/backup/backup.go:417" name=busybox-perf-single-ns-20-pods-9 2022/08/10 13:38:40 error Timed out awaiting reconciliation of volumesnapshotbackup busybox-perf-single-ns-20-pods/vsb-nk52z 2022/08/10 13:38:40 error Timed out awaiting reconciliation of volumesnapshotbackup busybox-perf-single-ns-20-pods/vsb-q9mmj

      2022/08/10 13:38:41 error failed to wait for VolumeSnapshotBackups to be completed: timed out waiting for the condition time="2022-08-10T13:38:41Z" level=error msg="timed out waiting for the condition" backup=openshift-adp/datamover-csi-ocs-cephrbd-20pods logSource="pkg/controller/backup_controller.go:660" time="2022-08-10T13:49:38Z" level=error msg="Error updating download request" controller=download-request downloadRequest=openshift-adp/datamover-csi-ocs-cephrbd-20pods-66a22949-94ae-4dc0-8a65-c2da5d6465e9 error="downloadrequests.velero.io \"datamover-csi-ocs-cephrbd-20pods-66a22949-94ae-4dc0-8a65-c2da5d6465e9\" not found" logSource="pkg/controller/download_request_controller.go:74"

      backup:
          "status": {
              "completionTimestamp": "2022-08-10T13:38:41Z",
              "csiVolumeSnapshotsAttempted": 20,
              "csiVolumeSnapshotsCompleted": 20,
              "errors": 11,
              "expiration": "2022-09-09T11:09:33Z",
              "formatVersion": "1.1.0",
              "phase": "PartiallyFailed",
              "progress":

      {             "itemsBackedUp": 486,             "totalItems": 486         }

      ,
              "startTimestamp": "2022-08-10T11:09:33Z",
              "version": 1
          }

            emcmulla@redhat.com Emily McMullan
            dvaanunu@redhat.com David Vaanunu
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: