Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-11961

[2.10] [RDR] [Hub recovery] Auto import of managed clusters remains stuck on switching hubs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • ACM 2.10.4
    • ACM 2.10.0
    • Business Continuity
    • None
    • Critical
    • No

      Description of problem:

      related to https://issues.redhat.com/browse/ACM-11926
      Issue :
      The auto import operation doesn't work ( is run too early ) when the cleanupBeforeRestore is set to None on the acm restore resource. The auto import may also not complete properly if the cleanupBeforeRestore is set to CleanupRestore but the clean up operation is completed before the managed clusters backup is fully restored.

      Workaround:

      create the acm restore acm-restore resource and wait for all velero restore resources to show as completed ( cleanupBeforeRestore can be set to None or CleanupRestore )
      delete the acm-restore resource
      create a new resource ( same name, or different ); since all resources are already restored, the post restore operation which runs the auto import of the managed clusters will be able to complete for all clusters.
      The fix :
      Acm restore checks if all velero restore resources are completed and only then tries to run the post restore operation which is: cleaning up delta resources followed by the auto import operation for the managed clusters.
      The list of restore files should be refreshed when checking the overall status for this acm restore, otherwise it takes the first created velero restore ( which is the credentials restore ) and validates this status only. So the post restore operation starts as soon as the credentials backup is restored.
      The issue is visible when the CleanupRestore option is set to None on the acm restore. In this case, the acm restore state is set to Finished as soon as the credentials restore is completed and since the post restore doesn't call the delta cleanup, the auto import operation ( which would be run after the resources cleanup ) executes before the managed clusters are restored so the auto import doesn't do anything.

      How to reproduce the issue:

      Have a backup hub with one managed cluster; enable MSA and run a schedule to create backups
      On a new hub, create a restore all, with cleanup set to None, see below ( restore-acm )
      The issue if reproduced : the status of the restore-acm doesn't show the messages info ( post restore was executed but no managed clusters were found and processed )
      messages:

      • managed cluster amagrawa-c1-28my already available
      • Created auto-import-secret for (amagrawa-c2-my28)
        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: Restore
        metadata:
        name: restore-acm
        namespace: open-cluster-management-backup
        spec:
        cleanupBeforeRestore: None
        veleroManagedClustersBackupName: latest
        veleroCredentialsBackupName: latest
        veleroResourcesBackupName: latest
        A restore where this issue is reproduced ( see restore status missing the import section ):

      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Restore
      metadata:
      creationTimestamp: "2024-05-30T15:47:04Z"
      generation: 1
      name: restore-acm
      namespace: open-cluster-management-backup
      resourceVersion: "5977827"
      uid: 10f5a1e9-587e-481f-86d5-0c098f4b0950
      spec:
      cleanupBeforeRestore: None
      veleroCredentialsBackupName: latest
      veleroManagedClustersBackupName: latest
      veleroResourcesBackupName: latest
      status:
      lastMessage: All Velero restores have run successfully
      phase: Finished
      veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20240530153937-active
      veleroGenericResourcesRestoreName: restore-acm-acm-resources-generic-schedule-20240530153937
      veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20240530153937
      veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20240530153937
      The restore status should be

      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Restore
      metadata:
      creationTimestamp: "2024-05-30T14:01:08Z"
      generation: 1
      name: restore-acm
      namespace: open-cluster-management-backup
      resourceVersion: "2822219"
      uid: 31d13ca1-1af2-4e90-87c3-c95ad756683d
      spec:
      cleanupBeforeRestore: None
      veleroCredentialsBackupName: latest
      veleroManagedClustersBackupName: latest
      veleroResourcesBackupName: latest
      status:
      lastMessage: All Velero restores have run successfully
      messages:

      • managed cluster amagrawa-c1-28my already available
      • Created auto-import-secret for (amagrawa-c2-my28)
        phase: Finished
        veleroCredentialsRestoreName: restore-acm-acm-credentials-schedule-20240530120055-active
        veleroGenericResourcesRestoreName: restore-acm-acm-resources-generic-schedule-20240530120055
        veleroManagedClustersRestoreName: restore-acm-acm-managed-clusters-schedule-20240530120055
        veleroResourcesRestoreName: restore-acm-acm-resources-schedule-20240530120055

        Version-Release number of selected component (if applicable):

        How reproducible:

      Always

      Steps to Reproduce:

      1. Have a backup hub with one managed cluster; enable MSA and run a schedule to create backups
      2. On a new hub, create a restore all, with cleanup set to None, see below ( restore-acm )
      3. The issue if reproduced : the status of the restore-acm doesn't show the messages info ( post restore was executed but no managed clusters were found and processed )

      Actual results:

      Expected results:

      Additional info:

            vbirsan@redhat.com Valentina Birsan
            vbirsan@redhat.com Valentina Birsan
            Thuy Nguyen Thuy Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: