-
Bug
-
Resolution: Done
-
Major
-
OADP 1.2.0
-
False
-
-
False
-
ToDo
-
-
-
0
-
0
-
Very Likely
-
0
-
None
-
Unset
-
Unknown
-
No
Description of problem:
While running CSI backup (1000pods), velero server was restarted.
The backup CR failed with error "failureReason": "found a backup with status \"InProgress\" during the server starting, mark it as \"Failed\""
Version-Release number of selected component (if applicable):
OCP 4.12.9, ODF 4.12.2, OADP 1.2.0-48
Using CephRBD
How reproducible:
Steps to Reproduce:
1. Create ns with 1000pods (busybox pods)
2. Running CSI backup
3. Verify Backup status
Actual results:
Velero was restarted during the backup.
Backup failed
Expected results:
No Velero restart and backup completed
Additional info:
Attached logs & DPA config
Velero log:
panic: reflect: slice index out of range [recovered]
panic: reflect: slice index out of range [recovered]
panic: reflect: slice index out of range
goroutine 2007 [running]:
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1()
/remote-source/velero/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.12.2/pkg/internal/controller/controller.go:118 +0x1f4
panic({0x20f9ae0, 0x2adc450})
/usr/lib/golang/src/runtime/panic.go:884 +0x212
encoding/json.(*encodeState).marshal.func1()
/usr/lib/golang/src/encoding/json/encode.go:327 +0x6e
level=error msg="fail to recreate VolumeSnapshotContent snapcontent-72bf9226-6b39-4a32-ac6d-65d8add49ef7: fail to retrieve VolumeSnapshotContent snapcontent-72bf9226-6b39-4a32-ac6d-65d8add49ef7 info: timed out waiting for the condition" backup=openshift-adp/csi-backup-rbd-1000pvs-iter3 controller=backup-finalizer logSource="/remote-source/velero/app/pkg/controller/backup_controller.go:1073