Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-821

CSI backup ,namespace with 1000 pods/pvcs failed with timeout error


    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown

      Description of problem:

      While running CSI backup of namespace with1000 pods - backup end with the status "PartiallyFailed ".

      Error Message:
      main-backup-scheduler-1000pods-every-2hrs-20220928-082124/backup-scheduler-1000pods-every-2hrs-20220928083022/backup-scheduler-1000pods-every-2hrs-20220928083022.log:time="2022-09-28T10:10:02Z" level=error msg="fail to recreate VolumeSnapshotContent snapcontent-7a455d87-5e00-42ba-b54c-3b16ba91df71: fail to retrieve VolumeSnapshotContent snapcontent-7a455d87-5e00-42ba-b54c-3b16ba91df71 info: timed out waiting for the condition" backup=openshift-adp/backup-scheduler-1000pods-every-2hrs-20220928083022 logSource="pkg/controller/backup_controller.go:985".

      Also running CSI backup of namespace with 80/90/100 pods - All backups were completed.

      Version-Release number of selected component (if applicable):

      OCP 4.10.26

      OADP 1.1.0-74 

      How reproducible:


      Steps to Reproduce:
      1. Create ns with 1000pods
      2. Run CSI backup
      3. Check backup status

      Actual results:

      Backup failed with "PartiallyFailed" status

      Expected results:

      Backup passed with "completed" status

      Additional info:


      Ran a few iterations with 10min timeout - backup completed (Using Private Velero)

      upstream issue: https://github.com/vmware-tanzu/velero/issues/5416




            emcmulla@redhat.com Emily McMullan
            dvaanunu@redhat.com David Vaanunu
            David Vaanunu David Vaanunu
            0 Vote for this issue
            7 Start watching this issue