[OADP-3071] Performance issues when restoring 30k resources at the first time

Type: Bug
Resolution: Obsolete
Priority: Critical
Fix Version/s: OADP 1.5.0
Affects Version/s: OADP 1.3.0
Component/s: velero
Labels:
- triaged

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
QEStatus:
ToDo
Intelligence Requested:
Market:

WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
Yes

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Following the bug: https://issues.redhat.com/browse/OADP-1167

While restoring the first time (without existing-resource-policy: update flag) - the time is 55min - double from OADP 1.1.0 results. - it is a regression bug

while restoring the 2nd & 3rd time with existing-resource-policy: update flag - the time is 28min - half from OADP 1.1.0 results.

See https://issues.redhat.com/browse/OADP-1167?focusedId=21683376&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-21683376)

Version-Release number of selected component (if applicable):

OCP 4.12.9

ODF 4.12.9-rhodf
OADP 1.3.0-138

How reproducible:

Steps to Reproduce:
1. Create namespace with 33K secerts
2. Run backup
3. Delete the namespace
4. run 1st restore

5. run a few restores with existing-resource-policy: update flag

Actual results:

first restore completed OK but the duration is double from OADP 1.1.0 results

Expected results:

first restore complete OK with at least the same duration as OADP 1.1.0 results

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

csi-acm-restore-rbd-33k-secrerts-ns1-new1.tar.gz
1.28 MB
2023/11/13 12:40 PM
csi-acm-restore-rbd-33k-secrerts-ns1-update1.tar.gz
1.38 MB
2023/11/13 12:40 PM
csi-acm-restore-rbd-33k-secrerts-ns1-update2.tar.gz
1.49 MB
2023/11/13 12:40 PM
csi-acm-restore-rbd-33k-secrerts-ns1-update3.tar.gz
1.49 MB
2023/11/13 12:40 PM

links to

openshift/openshift-docs#67404: OADP 2419 Release Notes and Update Notes fixed

Scott Seago added a comment - 2024/01/10 7:49 PM

Just to clarify – the additional time needed to restore new resources is not a consequence of fixing existing resources, it's actually a result of a change added to Velero 1.10 (first introduced in OADP 1.2), before any of the existing resource performance work was done. The reason this takes longer now is that Velero now restores the managed fields struct for resources as well, but this cannot be done in the original `Create` call, as that field is discarded, so the resource must be Updated post-creation, which doubles the number of API calls required per item, resulting in approximately doubling the time required to restore each (non-PVC) resource.

Scott Seago added a comment - 2024/01/10 7:49 PM Just to clarify – the additional time needed to restore new resources is not a consequence of fixing existing resources, it's actually a result of a change added to Velero 1.10 (first introduced in OADP 1.2), before any of the existing resource performance work was done. The reason this takes longer now is that Velero now restores the managed fields struct for resources as well, but this cannot be done in the original `Create` call, as that field is discarded, so the resource must be Updated post-creation, which doubles the number of API calls required per item, resulting in approximately doubling the time required to restore each (non-PVC) resource.

Wes Hayutin added a comment - 2024/01/09 1:16 AM

ok.. I spoke to Scott about this bug. We are going to look at it but not in the immediate cycles. I have to kick this out. https://redhat-internal.slack.com/archives/C0144ECKUJ0/p1704743947152359

Wes Hayutin added a comment - 2024/01/09 1:16 AM ok.. I spoke to Scott about this bug. We are going to look at it but not in the immediate cycles. I have to kick this out. https://redhat-internal.slack.com/archives/C0144ECKUJ0/p1704743947152359

Wes Hayutin added a comment - 2024/01/08 8:42 PM

Yes, performance can always be faster, but the slowdown in performance is a direct result of fixing restoring existing resources in https://issues.redhat.com/browse/OADP-1167

Wes Hayutin added a comment - 2024/01/08 8:42 PM Yes, performance can always be faster, but the slowdown in performance is a direct result of fixing restoring existing resources in https://issues.redhat.com/browse/OADP-1167

Scott Seago added a comment - 2024/01/08 7:57 PM

sseago No, the informer cache change improved performance for resources that already exist in the cluster. This is referencing the slowdown in performance for new resources.

There is no resolution for this, although I wouldn't consider it a regression. Velero fixed a bug (managed fields weren't being set properly), but to do this requires patching the resource post-creation, which means twice as many API calls per resource on restore. Velero is doing more things than before, therefore it takes longer.

Scott Seago added a comment - 2024/01/08 7:57 PM sseago No, the informer cache change improved performance for resources that already exist in the cluster. This is referencing the slowdown in performance for new resources. There is no resolution for this, although I wouldn't consider it a regression. Velero fixed a bug (managed fields weren't being set properly), but to do this requires patching the resource post-creation, which means twice as many API calls per resource on restore. Velero is doing more things than before, therefore it takes longer.

Wes Hayutin added a comment - 2024/01/08 7:05 PM - edited

sseago this is fixed in 1.3.0 I believe, or is this also an informer cache setting issue at this point?

Wes Hayutin added a comment - 2024/01/08 7:05 PM - edited sseago this is fixed in 1.3.0 I believe, or is this also an informer cache setting issue at this point?

Wes Hayutin added a comment - 2024/01/05 5:25 PM

I believe the code required is in the upstream at this time. leaving in 1.3.2 for us to double check

Wes Hayutin added a comment - 2024/01/05 5:25 PM I believe the code required is in the upstream at this time. leaving in 1.3.2 for us to double check

Scott Seago added a comment - 2023/11/13 2:41 PM

This is a result of a change made in Velero 1.11/OADP 1.2 to restore managed fields. Managed fields are not set in the create call, so velero has to patch the resource post-creation. As a result, time spent restoring a resource which does not already exist in the cluster takes approx twice as long as in OADP 1.1.

The upstream issue: https://github.com/vmware-tanzu/velero/issues/5701

The restore code responsible for this new functionality: https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1743-L1762

Scott Seago added a comment - 2023/11/13 2:41 PM This is a result of a change made in Velero 1.11/OADP 1.2 to restore managed fields. Managed fields are not set in the create call, so velero has to patch the resource post-creation. As a result, time spent restoring a resource which does not already exist in the cluster takes approx twice as long as in OADP 1.1. The upstream issue: https://github.com/vmware-tanzu/velero/issues/5701 The restore code responsible for this new functionality: https://github.com/vmware-tanzu/velero/blob/main/pkg/restore/restore.go#L1743-L1762

Mordechai Lehrer added a comment - 2023/11/13 12:05 PM

removing 'regression' label and applying 'regression' field.

Mordechai Lehrer added a comment - 2023/11/13 12:05 PM removing 'regression' label and applying 'regression' field.

Assignee:: Wes Hayutin

Reporter:: David Vaanunu

QA Contact:: David Vaanunu

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/11/13 11:04 AM

Updated:: 2025/03/30 3:13 PM

Resolved:: 2024/08/28 6:21 PM

Details

Description

Description of problem:

Following the bug: https://issues.redhat.com/browse/OADP-1167

OCP 4.12.9

Actual results:

first restore completed OK but the duration is double from OADP 1.1.0 results

first restore complete OK with at least the same duration as OADP 1.1.0 results

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Scott Seago added a comment - 2024/01/10 7:49 PM

Expand comment: Scott Seago added a comment - 2024/01/10 7:49 PM

Collapse comment: Wes Hayutin added a comment - 2024/01/09 1:16 AM

Expand comment: Wes Hayutin added a comment - 2024/01/09 1:16 AM

Collapse comment: Wes Hayutin added a comment - 2024/01/08 8:42 PM

Expand comment: Wes Hayutin added a comment - 2024/01/08 8:42 PM

Collapse comment: Scott Seago added a comment - 2024/01/08 7:57 PM

Expand comment: Scott Seago added a comment - 2024/01/08 7:57 PM

Collapse comment: Wes Hayutin added a comment - 2024/01/08 7:05 PM, Edited by Wes Hayutin - 2024/01/08 7:06 PM

Expand comment: Wes Hayutin added a comment - 2024/01/08 7:05 PM, Edited by Wes Hayutin - 2024/01/08 7:06 PM

Collapse comment: Wes Hayutin added a comment - 2024/01/05 5:25 PM

Expand comment: Wes Hayutin added a comment - 2024/01/05 5:25 PM

Collapse comment: Scott Seago added a comment - 2023/11/13 2:41 PM

Expand comment: Scott Seago added a comment - 2023/11/13 2:41 PM

Collapse comment: Mordechai Lehrer added a comment - 2023/11/13 12:05 PM

Expand comment: Mordechai Lehrer added a comment - 2023/11/13 12:05 PM

People

Dates