Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-1798

Velero restart during CSI backup with error "panic: reflect: slice index out of range"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • OADP 1.2.0
    • OADP 1.2.0
    • velero
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      While running CSI backup (1000pods), velero server was restarted.

      The backup  CR failed with error "failureReason": "found a backup with status \"InProgress\" during the server starting, mark it as \"Failed\""

      Version-Release number of selected component (if applicable):

      OCP 4.12.9, ODF 4.12.2, OADP 1.2.0-48 

      Using CephRBD

       

      How reproducible:

       

      Steps to Reproduce:
      1. Create ns with 1000pods (busybox pods)
      2. Running CSI backup
      3. Verify Backup status

      Actual results:

      Velero was restarted during the backup.

      Backup failed

      Expected results:

      No Velero restart  and backup completed

      Additional info:

      Attached logs & DPA config

      Velero log:

      panic: reflect: slice index out of range [recovered]
              panic: reflect: slice index out of range [recovered]
              panic: reflect: slice index out of range

      goroutine 2007 [running]:
      sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
              /remote-source/velero/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:118 +0x1f4
      panic({0x20f9ae0, 0x2adc450})
              /usr/lib/golang/src/runtime/panic.go:884 +0x212
      encoding/json.(*encodeState).marshal.func1()
              /usr/lib/golang/src/encoding/json/encode.go:327 +0x6e

       level=error msg="fail to recreate VolumeSnapshotContent snapcontent-72bf9226-6b39-4a32-ac6d-65d8add49ef7: fail to retrieve VolumeSnapshotContent snapcontent-72bf9226-6b39-4a32-ac6d-65d8add49ef7 info: timed out waiting for the condition" backup=openshift-adp/csi-backup-rbd-1000pvs-iter3 controller=backup-finalizer logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:1073

              spampatt@redhat.com Shubham Pampattiwar
              dvaanunu@redhat.com David Vaanunu
              David Vaanunu David Vaanunu
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: