Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-9469

Adopted ManagedClusters Go to Unknown Status After Restore on a New ACM Hub


    • False
    • None
    • False
    • Moderate
    • No

      Description of problem:

      When restore data containing a ManagedCluster that was adopted via Hive's Adoption ability is applied to a new ACM hub, the ManagedCluster immediately re-joins as expected, but very quickly moves into an unknown state. After two hours, the ManagedCluster re-joins on its own without issue. 

      Version-Release number of selected component (if applicable):


      How reproducible:


      Steps to Reproduce:

      1. Adopt a cluster into an ACM hub that has the cluster-backup operator installed and configured. Use the instructions here to adopt: https://github.com/openshift/hive/blob/master/docs/using-hive.md#cluster-adoption
      2. Ensure the cluster.open-cluster-management.io/backup=true label is applied to the adopted managedcluster's admin-kubeconfig secret.
      3. Have the cluster-backup operator take a backup of the ACM hub.
      4. Redeploy the ACM hub
      5. Apply the restore to the newly deployed ACM hub

      Actual results:

      After initial restore, all ManagedClusters (deployed by the restored ACM hub as well as adopted clusters) re-join without issue:

      [root@bastion.<redacted> ~]# oc get managedclusters
      NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
      <redacted>   true           https://api.<redacted>:6443   True     True        2m52s
      <redacted>   true           https://api.<redacted>:6443   True     True        2m52s


      Shortly after, the adopted cluster goes into an unknown state:

      [root@bastion.<redacted> ~]# oc get managedclusters
      NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
      local-cluster          true           https://api.<redacted>:6443           True     True        4m9s
      <redacted>   true           https://api.<redacted>:6443   True     True        25m
      <redacted>   true           https://api.<redacted>:6443   True     Unknown     25m 

      After approximately two hours, the adopted cluster re-joins the ACM hub without any manual intervention:

      [root@bastion.<redacted> ~]# oc get managedclusters
      NAME                   HUB ACCEPTED   MANAGED CLUSTER URLS                                       JOINED   AVAILABLE   AGE
      local-cluster          true           https://api.<redacted>:6443           True     True        5h26m
      <redacted>   true           https://api.<redacted>:6443   True     True        120m
      <redacted>   true           https://api.<redacted>:6443   True     True        120m

      Expected results:

      Adopted clusters successfully re-join the ACM hub post-restore and stay joined.

      Additional info:

      This behavior appears to be similar to https://issues.redhat.com/browse/ACM-8746.

            jiazhu@redhat.com Jian Zhu
            scarlisl@redhat.com Sean Carlisle
            Hui Chen Hui Chen
            0 Vote for this issue
            4 Start watching this issue
