-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
-
-
-
Moderate
-
None
Description of problem:
When running through a restoration exercise for an ACM hub with around 240 managed clusters, the restoration ran for several hours (restored 3053 items out of 3113) before ultimately failing because the TTL of the backups being used had expired and Velero deleted them.
According to internal discussions this is many times longer than the expected restoration time and we are looking to understand why.
Version-Release number of selected component (if applicable):
2.10
How reproducible:
In both tested runs the restore took several hours on this environment.
Steps to Reproduce:
- ...
Actual results:
Expected results:
Additional info:
The first time we thought maybe it was because Velero was struggling to deal with the fact we had connected it to the backup store in ReadOnly mode to keep it from deleting the backups it would use. This restoration eventually completed successfully.
The second time we ran a restore, we did not connect to the backup store in ReadOnly mode, but had set the veleroTtl value to 10 hours because we thought this would be more than sufficient (~ 2 hours to redeploy the ACM hub and do basic sanity checks and health checks, 8 hours for restoration), but the restore still took several hours and even exceeded the TTL of the backup.