Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-2483

Failure adding managed cluster: Klusterlet unable to list AppliedManifestWorks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Normal Normal
    • None
    • ACM 2.7.0
    • Cluster Lifecycle
    • False
    • None
    • False
    • Moderate
    • No

      Description of problem:

      Trying to add a new managed cluster (OCP  4.11.0 with FIPS on OSP) to ACM 2.7.0 hub, keeps failing, and I'm seeing on the governance-policy-framework pod:

      x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")"}

      Version-Release number of selected component (if applicable):

      Hub: ACM 2.7.0 on OCP 4.12.0-rc.5 on AWS

      Managed Cluster: OCP 4.11.0 with FIPS on OSP

      How reproducible:

      Many times when importing a cluster that was previously imported on another Hub.

      Steps to Reproduce:

      1. Install OCP 4.12 on AWS: Cluster A
      2. Install ACM 2.7.0 on Cluster A
      3. Install OCP 4.11 on OSP (PSI) not via ACM: Cluster B
      4. Import Cluster B as a managed cluster into ACM
      5. Destroy Cluster A (without de-attaching cluster B from Hub)
      6. Repeat steps 1 and 2 (reinstall ACM on new Cluster A)
      7. Repeat step 4 (import cluster B)

      The issue is reproduced even when adding another step before step 7:

      Delete all ACM resources from Cluster B (e.g. found in "mce", "ocm", and the managed cluster namespaces).

       

      Full scenario:

      https://qe-jenkins-csb-skynet.apps.ocp-c1.prod.psi.redhat.com/job/ACM-2.7.0-Submariner-0.14.1-AWS-OSP-Globalnet-OVN/77/Test-Report/

      Actual results:

      The managed cluster cannot be imported successfully:

       

      $ oc  get managedcluster -o wide
      
      NAME                HUB ACCEPTED   MANAGED CLUSTER URLS                                      JOINED   AVAILABLE   AGE
      acm-aws-nmanos-a2   true            https://api.aws-nmanos-a2.devcluster.openshift.com:6443    True     True        18m
      acm-osp-nmanos-b2   true                                                                              Unknown     18m
      

      See more output in attached logs.

       

      Some of the output I saw:

       $▶ oc get all -n open-cluster-management-agent-addon 
      NAME                                               READY   STATUS             RESTARTS          AGE
      pod/application-manager-656666f644-bknkr           1/1     Running            1 (10d ago)       13d
      pod/cert-policy-controller-64b4768b8-q2h96         1/1     Running            2 (4d7h ago)      13d
      pod/config-policy-controller-7ccc5795-zdfpf        1/1     Running            1 (13d ago)       13d
      pod/governance-policy-framework-74db9d5b6d-8blpv   1/3     CrashLoopBackOff   153 (2m23s ago)   6h9m
      pod/iam-policy-controller-75f4576d89-n8pmz         1/1     Running            3 (4d18h ago)     13d
      pod/klusterlet-addon-search-f647477f6-2d64c        1/1     Running            0                 13d
      pod/klusterlet-addon-workmgr-544bc9b78b-v4fb9      1/1     Running            77 (10d ago)      10dNAME                               TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
      service/klusterlet-addon-workmgr   ClusterIP   100.97.182.219   <none>        443/TCP   13dNAME                                          READY   UP-TO-DATE   AVAILABLE   AGE
      deployment.apps/application-manager           1/1     1            1           13d
      deployment.apps/cert-policy-controller        1/1     1            1           13d
      deployment.apps/config-policy-controller      1/1     1            1           13d
      deployment.apps/governance-policy-framework   0/1     1            0           13d
      deployment.apps/iam-policy-controller         1/1     1            1           13d
      deployment.apps/klusterlet-addon-search       1/1     1            1           13d
      deployment.apps/klusterlet-addon-workmgr      1/1     1            1           13dNAME                                                     DESIRED   CURRENT   READY   AGE
      replicaset.apps/application-manager-656666f644           1         1         1       13d
      replicaset.apps/cert-policy-controller-64b4768b8         1         1         1       13d
      replicaset.apps/config-policy-controller-7ccc5795        1         1         1       13d
      replicaset.apps/governance-policy-framework-74db9d5b6d   1         1         0       13d
      replicaset.apps/iam-policy-controller-75f4576d89         1         1         1       13d
      replicaset.apps/klusterlet-addon-search-f647477f6        1         1         1       13d
      replicaset.apps/klusterlet-addon-workmgr-544bc9b78b      1         1         1       13d
      replicaset.apps/klusterlet-addon-workmgr-7c94f774d       0         0         0       13d
      
      
      $ oc logs deployment.apps/governance-policy-framework -n open-cluster-management-agent-addon --all-containers
      
      2022-12-19T22:01:26.209Z    info    setup    app/main.go:61    Operator Version: 0.0.1
      2022-12-19T22:01:26.210Z    info    setup    app/main.go:62    Go Version: go1.19.2
      2022-12-19T22:01:26.210Z    info    setup    app/main.go:63    Go OS/Arch: linux/amd64
      2022-12-19T22:01:29.217Z    error    cluster/cluster.go:160    Failed to get API Group-Resources    {"error": "Get \"https://api.aws-nmanos-a2.devcluster.openshift.com:6443/api?timeout=32s\": x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")"}
      sigs.k8s.io/controller-runtime/pkg/cluster.New
          /remote-source/governance-policy-spec-sync/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/cluster/cluster.go:160
      sigs.k8s.io/controller-runtime/pkg/manager.New
          /remote-source/governance-policy-spec-sync/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/manager/manager.go:313
      main.main
          /remote-source/governance-policy-spec-sync/app/main.go:206
      runtime.main
          /usr/lib/golang/src/runtime/proc.go:250
      2022-12-19T22:01:29.217Z    error    setup    app/main.go:208    Failed to start manager    {"error": "Get \"https://api.aws-nmanos-a2.devcluster.openshift.com:6443/api?timeout=32s\": x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")"}
      main.main
          /remote-source/governance-policy-spec-sync/app/main.go:208
      runtime.main
          /usr/lib/golang/src/runtime/proc.go:250
      2022-12-19T22:01:26.425Z    info    setup    app/main.go:54    Operator Version: 0.0.1
      2022-12-19T22:01:26.426Z    info    setup    app/main.go:55    Go Version: go1.19.2
      2022-12-19T22:01:26.426Z    info    setup    app/main.go:56    Go OS/Arch: linux/amd64
      2022-12-19T22:01:26.574Z    error    setup    app/main.go:142    Failed to generate client to the hub cluster    {"error": "Get \"https://api.aws-nmanos-a2.devcluster.openshift.com:6443/api?timeout=32s\": x509: certificate signed by unknown authority (possibly because of \"crypto/rsa: verification error\" while trying to verify candidate authority certificate \"kube-apiserver-lb-signer\")"}
      main.main
          /remote-source/governance-policy-status-sync/app/main.go:142
      runtime.main
          /usr/lib/golang/src/runtime/proc.go:250
      2022-12-19T14:25:33.477Z    info    setup    app/main.go:37    Using    {"OperatorVersion": "0.0.1", "GoVersion": "go1.19.2", "GOOS": "linux", "GOARCH": "amd64"}
      2022-12-19T14:25:36.332Z    info    setup    app/main.go:134    Registering components
      2022-12-19T14:25:36.332Z    info    setup    app/main.go:157    Starting the manager
      2022-12-19T14:25:36.333Z    info    runtime/asm_amd64.s:1594    Starting server    {"kind": "health probe", "addr": "[::]:8083"}
      2022-12-19T14:25:53.401Z    info    controller.policy-template-sync    controller/controller.go:234    Starting EventSource    {"reconciler group": "policy.open-cluster-management.io", "reconciler kind": "Policy", "source": "kind source: *v1.Policy"}
      2022-12-19T14:25:53.401Z    info    controller.policy-template-sync    controller/controller.go:234    Starting Controller    {"reconciler group": "policy.open-cluster-management.io", "reconciler kind": "Policy"}
      2022-12-19T14:25:53.502Z    info    controller.policy-template-sync    controller/controller.go:234    Starting workers    {"reconciler group": "policy.open-cluster-management.io", "reconciler kind": "Policy", "worker count": 1}
        

      Expected results:

      Managed cluster should be imported successfully.

      Additional info:

      The must-gather logs are larger then 300Mb, so cannot be uploaded here. Let me know if you need a partial logs from it.

              zyin@redhat.com Zhiwei Yin
              nmanos@redhat.com Noam Manos
              Hui Chen Hui Chen
              ACM QE Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: