Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-11038

Post-hook automation template not triggered after cluster installation with CIM

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Undefined Undefined
    • None
    • None
    • Cluster Lifecycle, Console
    • None
    • False
    • None
    • False
    • Important
    • No

      Description of the problem:

      Installed 4.14 standalone cluster using CIM (late-binding) on 4.14 hub cluster with MCE 2.4.5-DOWNANDBACK-2024-04-08-15-45-29. The clustercurator triggered successfully the pre-hook job for the install stage but it failed to trigger the post-hook job after the cluster installation completes. The cause seems to be a timeout in the `activate-and-monitor` container in the `curator-job` pod.
      From the ClusterCurator status:

      status:
        conditions:
        - lastTransitionTime: "2024-04-10T14:57:07Z"
          message: 'curator-job-2dc9s DesiredCuration: install Failed - Timed out waiting
            for job'
          reason: Job_failed
          status: "True"
          type: clustercurator-job
        - lastTransitionTime: "2024-04-10T14:43:47Z"
          message: Completed executing init container
          reason: Job_has_finished
          status: "True"
          type: prehook-ansiblejob
        - lastTransitionTime: "2024-04-10T14:43:46Z"
          message: prehookjob-2cmr8
          reason: Job_has_finished
          status: "True"
          type: current-ansiblejob
        - lastTransitionTime: "2024-04-10T14:43:41Z"
          message: https://acmqe-test-ansible.install.dev09.red-chesterfield.com/#/jobs/playbook/8532
          reason: ansiblejob_url
          status: "True"
          type: prehookjob-2cmr8
        - lastTransitionTime: "2024-04-10T14:57:06Z"
          message: Timed out waiting for job
          reason: Job_failed
          status: "True"
          type: activate-and-monitor 

      And from the `activate-and-monitor` container's logs:

      $ oc logs -n spoke-1 curator-job-2dc9s-lmtj6 -c activate-and-monitor
      I0410 14:44:13.092834       1 curator.go:86] Mode: activate-and-monitor Cluster
      I0410 14:44:14.892435       1 curator.go:111] Found clusterCurator resource "spoke-1" ✓
      I0410 14:44:17.792301       1 hive.go:36] * Initiate Provisioning
      I0410 14:44:17.792359       1 hive.go:37] Looking up cluster spoke-1
      I0410 14:44:17.894102       1 hive.go:45] Found cluster spoke-1 ✓
      2024/04/10 14:44:18 Updated ClusterDeployment ✓
      I0410 14:44:19.791973       1 hive.go:104] Waiting up to 750s for Hive Provisioning job
      I0410 14:44:20.198859       1 hive.go:227] Attempt: 1/150, pause 5s
      I0410 14:44:25.391953       1 hive.go:227] Attempt: 2/150, pause 5s
      I0410 14:44:30.691870       1 hive.go:227] Attempt: 3/150, pause 5s
      ...
      ...
      I0410 14:56:56.691914       1 hive.go:227] Attempt: 149/150, pause 5s
      I0410 14:57:01.813008       1 hive.go:227] Attempt: 150/150, pause 5s
      E0410 14:57:07.192040       1 curator.go:243] Timed out waiting for job
      panic: Timed out waiting for job [recovered]
          panic: Timed out waiting for jobgoroutine 1 [running]:
      main.curatorRun.func1()
          /remote-source/app/cmd/curator/curator.go:150 +0x23d
      panic({0x171edc0?, 0xc00070b230?})
          /usr/lib/golang/src/runtime/panic.go:914 +0x21f
      main.curatorRun(0x0?, {0x1bfa8e0?, 0xc0002aa990}, {0x7ffc70e63e22, 0x7}, {0xc0000140e0, 0x7})
          /remote-source/app/cmd/curator/curator.go:244 +0x38aa
      main.main()
          /remote-source/app/cmd/curator/curator.go:66 +0x1bf

       

      Steps to reproduce:

      1. Install a late-binding standalone multinode cluster using CIM
      2. Add automation template in `Create cluster` wizard
      3. Wait for cluster to finish installation

      Actual results:

      The post-hook job fails to be triggered and the curator-job pod is in `Init:Error` status

      Expected results:

      The post-hook job is triggered successfully

              rh-ee-kcormier Kevin Cormier
              epassaro@redhat.com Elsa Passaro
              Lubov Shilin
              David Huynh David Huynh
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: