Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-2686

OADP-1.3.0: ACM cluster restore is broken due to restore order

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Blocker Blocker
    • OADP 1.3.0
    • OADP 1.2
    • acm-cr
    • False
    • Hide

      None

      Show
      None
    • False
    • oadp-operator-bundle-container-1.3.0-117
    • ToDo
    • Critical
    • 10
    • 0
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem: ACM applications are removed and re-created on managed cluster(s) after restore

       

      Version-Release number of selected component (if applicable):

      OCP 4.10.65 + FIPS enabled
      ACM 2.8.1-FC6
      OADP 1.2

      Steps to Reproduce:

      Hub running 2.8.1-FC6 with custom self-signed ingress + api serving certs, having managed clusters provisioned
      Apps created and propagated onto managed clusters >> OK
      Backup hub >> OK
      Destroy + rebuilt the hub using same name, yet default ingress + api serving certs
      Activate restore on the rebuilt hub

      Actual results:

      Apps on managed cluster(s) are deleted and re-created upon restore activation

      Additional info:

      OADP 1.2 is faster in backup/restore process compared to older versions. We didn't see this issue in previous versions of OADP although we didn't have any prioritized restore order for ACM resources. We guess the OADP performance change is causing this new behaviour when restoring ACM resources so some resources get restored before other resources which is causing the removal of the applications from managed clusters.

      It is necessary to restore the whole managed cluster related resources first with the following order. They are the common foundation resources that could be widely used by other components.

      managedcluster.cluster.open-cluster-management.io
      klusterletaddonconfig.agent.open-cluster-management.io
      managedclusteraddon.addon.open-cluster-management.io
      managedcluster.clusterview.open-cluster-management.io

      The other hive/bare metal/observability resources don't need to be prioritized

      resources to be restored in the following order <CONFIRMED>

      securitycontextconstraints,customresourcedefinitions,namespaces,

      managedcluster.cluster.open-cluster-management.io,
      managedcluster.clusterview.open-cluster-management.io,
      klusterletaddonconfig.agent.open-cluster-management.io,
      managedclusteraddon.addon.open-cluster-management.io,

      storageclasses,volumesnapshotclass.snapshot.storage.k8s.io,volumesnapshotcontents.snapshot.storage.k8s.io,volumesnapshots.snapshot.storage.k8s.io,datauploads.velero.io,persistentvolumes,persistentvolumeclaims,serviceaccounts,secrets,configmaps,limitranges,pods,replicasets.apps,clusterclasses.cluster.x-k8s.io,endpoints,services,-,clusterbootstraps.run.tanzu.vmware.com,clusters.cluster.x-k8s.io,clusterresourcesets.addons.cluster.x-k8s.io

              sseago Scott Seago
              saharebrahimi Sahar Ebrahimi
              Amos Mastbaum Amos Mastbaum
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: