-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
-
-
Important
-
No
Description of the problem:
Installed 4.14 standalone cluster using CIM (late-binding) on 4.14 hub cluster with MCE 2.4.5-DOWNANDBACK-2024-04-08-15-45-29. The clustercurator triggered successfully the pre-hook job for the install stage but it failed to trigger the post-hook job after the cluster installation completes. The cause seems to be a timeout in the `activate-and-monitor` container in the `curator-job` pod.
From the ClusterCurator status:
status: conditions: - lastTransitionTime: "2024-04-10T14:57:07Z" message: 'curator-job-2dc9s DesiredCuration: install Failed - Timed out waiting for job' reason: Job_failed status: "True" type: clustercurator-job - lastTransitionTime: "2024-04-10T14:43:47Z" message: Completed executing init container reason: Job_has_finished status: "True" type: prehook-ansiblejob - lastTransitionTime: "2024-04-10T14:43:46Z" message: prehookjob-2cmr8 reason: Job_has_finished status: "True" type: current-ansiblejob - lastTransitionTime: "2024-04-10T14:43:41Z" message: https://acmqe-test-ansible.install.dev09.red-chesterfield.com/#/jobs/playbook/8532 reason: ansiblejob_url status: "True" type: prehookjob-2cmr8 - lastTransitionTime: "2024-04-10T14:57:06Z" message: Timed out waiting for job reason: Job_failed status: "True" type: activate-and-monitor
And from the `activate-and-monitor` container's logs:
$ oc logs -n spoke-1 curator-job-2dc9s-lmtj6 -c activate-and-monitor I0410 14:44:13.092834 1 curator.go:86] Mode: activate-and-monitor Cluster I0410 14:44:14.892435 1 curator.go:111] Found clusterCurator resource "spoke-1" ✓ I0410 14:44:17.792301 1 hive.go:36] * Initiate Provisioning I0410 14:44:17.792359 1 hive.go:37] Looking up cluster spoke-1 I0410 14:44:17.894102 1 hive.go:45] Found cluster spoke-1 ✓ 2024/04/10 14:44:18 Updated ClusterDeployment ✓ I0410 14:44:19.791973 1 hive.go:104] Waiting up to 750s for Hive Provisioning job I0410 14:44:20.198859 1 hive.go:227] Attempt: 1/150, pause 5s I0410 14:44:25.391953 1 hive.go:227] Attempt: 2/150, pause 5s I0410 14:44:30.691870 1 hive.go:227] Attempt: 3/150, pause 5s ... ... I0410 14:56:56.691914 1 hive.go:227] Attempt: 149/150, pause 5s I0410 14:57:01.813008 1 hive.go:227] Attempt: 150/150, pause 5s E0410 14:57:07.192040 1 curator.go:243] Timed out waiting for job panic: Timed out waiting for job [recovered] panic: Timed out waiting for jobgoroutine 1 [running]: main.curatorRun.func1() /remote-source/app/cmd/curator/curator.go:150 +0x23d panic({0x171edc0?, 0xc00070b230?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f main.curatorRun(0x0?, {0x1bfa8e0?, 0xc0002aa990}, {0x7ffc70e63e22, 0x7}, {0xc0000140e0, 0x7}) /remote-source/app/cmd/curator/curator.go:244 +0x38aa main.main() /remote-source/app/cmd/curator/curator.go:66 +0x1bf
Steps to reproduce:
- Install a late-binding standalone multinode cluster using CIM
- Add automation template in `Create cluster` wizard
- Wait for cluster to finish installation
Actual results:
The post-hook job fails to be triggered and the curator-job pod is in `Init:Error` status
Expected results:
The post-hook job is triggered successfully