[OADP-2505] OADP-1.2.2: ACM cluster restore is broken due to restore order - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Blocker
Fix Version/s: OADP 1.2.2
Affects Version/s: OADP 1.2
Component/s: acm-cr
Labels:
- triaged

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Fixed in Build:
oadp-operator-bundle-container-1.2.2-19
QEStatus:
ToDo
Intelligence Requested:
Market:

Severity:
Critical
WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0
Cost of Delay:
10

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem: ACM applications are removed and re-created on managed cluster(s) after restore

Version-Release number of selected component (if applicable):

OCP 4.10.65 + FIPS enabled
ACM 2.8.1-FC6
OADP 1.2

Steps to Reproduce:

Hub running 2.8.1-FC6 with custom self-signed ingress + api serving certs, having managed clusters provisioned
Apps created and propagated onto managed clusters >> OK
Backup hub >> OK
Destroy + rebuilt the hub using same name, yet default ingress + api serving certs
Activate restore on the rebuilt hub

Actual results:

Apps on managed cluster(s) are deleted and re-created upon restore activation

Additional info:

OADP 1.2 is faster in backup/restore process compared to older versions. We didn't see this issue in previous versions of OADP although we didn't have any prioritized restore order for ACM resources. We guess the OADP performance change is causing this new behaviour when restoring ACM resources so some resources get restored before other resources which is causing the removal of the applications from managed clusters.

It is necessary to restore the whole managed cluster related resources first with the following order. They are the common foundation resources that could be widely used by other components.

managedcluster.cluster.open-cluster-management.io
klusterletaddonconfig.agent.open-cluster-management.io
managedclusteraddon.addon.open-cluster-management.io
managedcluster.clusterview.open-cluster-management.io

The other hive/bare metal/observability resources don't need to be prioritized

resources to be restored in the following order <CONFIRMED>

securitycontextconstraints,customresourcedefinitions,namespaces,

managedcluster.cluster.open-cluster-management.io,
managedcluster.clusterview.open-cluster-management.io,
klusterletaddonconfig.agent.open-cluster-management.io,
managedclusteraddon.addon.open-cluster-management.io,

storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io

is cloned by

OADP-2686 OADP-1.3.0: ACM cluster restore is broken due to restore order

Closed

links to

openshift/oadp-operator#1147: OADP-2505: Add additional types to default restore priorities

RHBA-2023:118617 OpenShift API for Data Protection (OADP) 1.2.2 security and bug fix update

mentioned on

Merge request - Updated 2 upstream sources

Assignee:: Scott Seago

Reporter:: Sahar Ebrahimi

QA Contact:: Amos Mastbaum

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2023/08/24 4:26 PM

Updated:: 2025/03/30 3:17 PM

Resolved:: 2023/10/16 3:01 PM

Details

Description

Description of problem: ACM applications are removed and re-created on managed cluster(s) after restore

Steps to Reproduce:

Actual results:

Additional info:

resources to be restored in the following order <CONFIRMED>

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide