Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8240

ClusterCurator not triggering post-hook after agent-based hosted cluster install

XMLWordPrintable

    • False
    • None
    • False
    • Important
    • No

      Description of problem:

      Installing an agent-based hosted cluster with ClusterCurator. The post-hook for the install stage is not triggered on Ansible Tower after the install stage completes.

      In ClusterCurator CR status there is the following condition:

        - lastTransitionTime: "2023-10-18T21:24:13Z"
          message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion:
            interface {} is nil, not map[string]interface {}'
          reason: Job_failed
          status: "True"
          type: clustercurator-job 

      The curator-job pod is in status "Init:Error":

      $ oc get po -n clusters
      NAME                      READY   STATUS       RESTARTS   AGE
      curator-job-lfbtd-x4r6d   0/1     Init:Error   0          8m51s
      prehookjob-xtt5h-gmg79    0/1     Completed    0          8m12s 

      And the activate-and-monitor container inside the curator-job-lfbtd-x4r6d pod is showing the following logs:

      $ oc logs -n clusters curator-job-lfbtd-x4r6d -c activate-and-monitor
      I1018 21:24:10.506372       1 curator.go:86] Mode: activate-and-monitor Cluster
      I1018 21:24:11.506007       1 curator.go:111] Found clusterCurator resource "clusters" ✓
      I1018 21:24:12.406396       1 hypershift.go:34] * Initiate Hypershift Provisioning
      I1018 21:24:12.705437       1 hypershift.go:65] Looking up hostedclusters hosted-1 namespace clusters
      I1018 21:24:12.906785       1 hypershift.go:80] Found hostedclusters hosted-1 in namespace clusters ✓
      I1018 21:24:12.906879       1 hypershift.go:100] Patching hostedclusters hosted-1 in namespace clusters ✓
      I1018 21:24:12.925932       1 hypershift.go:107] Updated hostedclusters ✓
      I1018 21:24:13.206791       1 hypershift.go:65] Looking up nodepools hosted-1 namespace clusters
      I1018 21:24:13.307978       1 hypershift.go:80] Found nodepools hosted-1 in namespace clusters ✓
      I1018 21:24:13.308014       1 hypershift.go:100] Patching nodepools hosted-1 in namespace clusters ✓
      I1018 21:24:13.406278       1 hypershift.go:107] Updated nodepools ✓
      I1018 21:24:13.505868       1 hypershift.go:121] Waiting up to 750s for Hypershift Provisioning job
      I1018 21:24:13.517140       1 hypershift.go:157] Found HostedCluster status details ✓
      panic: interface conversion: interface {} is nil, not map[string]interface {} [recovered]
          panic: interface conversion: interface {} is nil, not map[string]interface {}goroutine 1 [running]:
      main.curatorRun.func1()
          /remote-source/app/cmd/curator/curator.go:150 +0x259
      panic({0x183f1a0, 0xc00044ab40})
          /usr/lib/golang/src/runtime/panic.go:884 +0x213
      github.com/stolostron/cluster-curator-controller/pkg/jobs/hypershift.MonitorClusterStatus({0x1ca5520, 0xc0001f0390}, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8}, {0x1a5d802, 0x9}, ...)
          /remote-source/app/pkg/jobs/hypershift/hypershift.go:164 +0x109a
      main.curatorRun(0x0?, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8})
          /remote-source/app/cmd/curator/curator.go:247 +0x138c
      main.main()
          /remote-source/app/cmd/curator/curator.go:66 +0x1c5 

      Version-Release number of selected component (if applicable):

      MCE 2.4.0-DOWNANDBACK-2023-10-15-14-41-45

      How reproducible:

      100%

      Steps to Reproduce:

      Steps are described in test case https://polarion.engineering.redhat.com/polarion/redirect/project/OSE/workitem?id=OCP-68375

      Actual results:

      The post-hook playbook is not triggered after the hostedcluster is installed and the curator-job pod is in Init:Error

      Expected results:

      The post-hook is triggered after the hostedcluster reaches "Completed" state

      Additional info:

      Full ClusterCurator YAML:

      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: ClusterCurator
      metadata:
        annotations:
          kubectl.kubernetes.io/last-applied-configuration: |
            {"apiVersion":"cluster.open-cluster-management.io/v1beta1","kind":"ClusterCurator","metadata":{"annotations":{},"name":"hosted-1","namespace":"clusters"},"spec":{"desiredCuration":"install","destroy":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"},"install":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"}}}
        creationTimestamp: "2023-10-18T21:22:51Z"
        generation: 11
        name: hosted-1
        namespace: clusters
        resourceVersion: "1358101"
        uid: f7c27a7a-3ae0-4a37-b17e-78824b1b7679
      spec:
        destroy:
          jobMonitorTimeout: 5
          posthook:
          - extra_vars:
              cluster: hosted-1
              stage: posthook
              type: destroy
            name: Auto_CLC_Sample_Template
            type: Job
          prehook:
          - extra_vars:
              cluster: hosted-1
              stage: prehook
              type: destroy
            name: Auto_CLC_Sample_Template
            type: Job
          towerAuthSecret: aap-tower-cred
        install:
          jobMonitorTimeout: 5
          posthook:
          - extra_vars:
              cluster: hosted-1
              stage: posthook
              type: install
            name: Auto_CLC_Sample_Template
            type: Job
          prehook:
          - extra_vars:
              cluster: hosted-1
              stage: prehook
              type: install
            name: Auto_CLC_Sample_Template
            type: Job
          towerAuthSecret: aap-tower-cred
        scale:
          jobMonitorTimeout: 5
        upgrade:
          monitorTimeout: 120
      status:
        conditions:
        - lastTransitionTime: "2023-10-18T21:24:13Z"
          message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion:
            interface {} is nil, not map[string]interface {}'
          reason: Job_failed
          status: "True"
          type: clustercurator-job
        - lastTransitionTime: "2023-10-18T21:23:47Z"
          message: Completed executing init container
          reason: Job_has_finished
          status: "True"
          type: prehook-ansiblejob
        - lastTransitionTime: "2023-10-18T21:23:46Z"
          message: prehookjob-xtt5h
          reason: Job_has_finished
          status: "True"
          type: current-ansiblejob
        - lastTransitionTime: "2023-10-18T21:23:40Z"
          message: https://acmqe-test-ansible.install.dev09.red-chesterfield.com/#/jobs/playbook/6089
          reason: ansiblejob_url
          status: "True"
          type: prehookjob-xtt5h
        - lastTransitionTime: "2023-10-18T21:24:12Z"
          message: Executing init container activate-and-monitor
          reason: Job_has_finished
          status: "False"
          type: activate-and-monitor 

        1. hosted-1_hypershift-control-plane.yaml
          14 kB
          Elsa Passaro
        2. hostedcluster.yaml
          9 kB
          Elsa Passaro

              fxiang@redhat.com Feng Xiang
              epassaro@redhat.com Elsa Passaro
              Elsa Passaro Elsa Passaro
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: