Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-11986

[2.10] Managed clusters created by RHACM do not reconnect to an AODP restored ACM on their own

XMLWordPrintable

    • 1
    • False
    • None
    • False
    • 1
    • SF Train-15 2024-03, SF Train-16
    • Moderate
    • No

      Description of problem:

      This customer's tests of the AODP restore revealed the clusters created by RHACM did not automatically reimport into the restored RHACM

      Version-Release number of selected component (if applicable):

      2.9

      How reproducible:

      customer environment

      Steps to Reproduce:

      1. deploy rhacm 2.9
      2. deploy clusters with rhacm
      3. setup backup to the base minimum [1]
      4. destroy and rebuild RHACM with AODP, same url need to be used, everything identical[2]

      Actual results:

      the managed clusters do not reattach :

      Failed to create &SelfSubjectAccessReview{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:SelfSubjectAccessReviewSpec{ResourceAttributes:&ResourceAttributes{Namespace:,Verb:get,Group:certificates.k8s.io,Version:,Resource:certificatesigningrequests,Subresource:,Name:,},NonResourceAttributes:nil,},Status:SubjectAccessReviewStatus{Allowed:false,Reason:,EvaluationError:,Denied:false,},} with hub config secret "open-cluster-management-agent"/"hub-kubeconfig-secret" to apiserver https://api.<hub-domain>:6443: Unauthorized
      

      Expected results:

      the managed clusters automatically reimport into RHACM

      Additional info:

      [1] - they originally did not save certificates into the backup.
      [2] - plan for backup and restore was not provided at this time, it is called a full hub restore.

      More Additional info:

      The onsite consultant raised the point that the Klusterlet on a managed clusters that is marked as "Pending Import" after a restore is flagged with the condition HubConnectionDegraded, reason BootstrapSecretFunctional,HubKubeConfigError

      Failed to create &SelfSubjectAccessReview{ObjectMeta:{      0 0001-01-01 00:00:00 +0000 UTC <nil> <nil> map[] map[] [] [] []},Spec:SelfSubjectAccessReviewSpec{ResourceAttributes:&ResourceAttributes{Namespace:,Verb:get,Group:certificates.k8s.io,Version:,Resource:certificatesigningrequests,Subresource:,Name:,},NonResourceAttributes:nil,},Status:SubjectAccessReviewStatus{Allowed:false,Reason:,EvaluationError:,Denied:false,},} with hub config secret "open-cluster-management-agent"/"hub-kubeconfig-secret" to apiserver https://api.<hub-domain>:6443: Unauthorized
      

      Forced bootstrapping of hub-kubeconfig-secret on the managed cluster, manually removing the hub-kubeconfig-secret and the kubelet-agent deployment successfully completes the import process (using auto import would likely also work)

      Looking at the code base, we can infer that the Klusterlet's ssarcontroller detected a working bootstrap secret. The hub kubeconfig seems to be able to authenticate, but lacks privileges to perform a SAR. Forcing the bootstraping replicates the recovery process that the bootstrap controller is supposed to initiate if it detects either an expired hub kubeconfig, or a mismatch of the API URL and/or the CA certificates between the bootstrap and the hub kubeconfig. It confirms that the bootstrap kubeconfig is in a working state. Also, it indicates that the certificates in the hub kubeconfig had been issued by the current CA on the hub. (Because otherwise, the bootstrap controller would have initiated this process automatically.)

      both the original and the recovered hub cluster use the same custom servingCertificate for their default FQDN in the APIServer CR

              jiazhu@redhat.com Jian Zhu
              rhn-support-fdewaley Felix Dewaleyne
              Hui Chen Hui Chen
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: