-
Bug
-
Resolution: Done
-
Critical
-
ACM 2.9.0
-
None
-
False
-
None
-
False
-
-
-
Important
-
No
Description of problem:
Installing an agent-based hosted cluster with ClusterCurator. The post-hook for the install stage is not triggered on Ansible Tower after the install stage completes.
In ClusterCurator CR status there is the following condition:
- lastTransitionTime: "2023-10-18T21:24:13Z" message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion: interface {} is nil, not map[string]interface {}' reason: Job_failed status: "True" type: clustercurator-job
The curator-job pod is in status "Init:Error":
$ oc get po -n clusters NAME READY STATUS RESTARTS AGE curator-job-lfbtd-x4r6d 0/1 Init:Error 0 8m51s prehookjob-xtt5h-gmg79 0/1 Completed 0 8m12s
And the activate-and-monitor container inside the curator-job-lfbtd-x4r6d pod is showing the following logs:
$ oc logs -n clusters curator-job-lfbtd-x4r6d -c activate-and-monitor I1018 21:24:10.506372 1 curator.go:86] Mode: activate-and-monitor Cluster I1018 21:24:11.506007 1 curator.go:111] Found clusterCurator resource "clusters" ✓ I1018 21:24:12.406396 1 hypershift.go:34] * Initiate Hypershift Provisioning I1018 21:24:12.705437 1 hypershift.go:65] Looking up hostedclusters hosted-1 namespace clusters I1018 21:24:12.906785 1 hypershift.go:80] Found hostedclusters hosted-1 in namespace clusters ✓ I1018 21:24:12.906879 1 hypershift.go:100] Patching hostedclusters hosted-1 in namespace clusters ✓ I1018 21:24:12.925932 1 hypershift.go:107] Updated hostedclusters ✓ I1018 21:24:13.206791 1 hypershift.go:65] Looking up nodepools hosted-1 namespace clusters I1018 21:24:13.307978 1 hypershift.go:80] Found nodepools hosted-1 in namespace clusters ✓ I1018 21:24:13.308014 1 hypershift.go:100] Patching nodepools hosted-1 in namespace clusters ✓ I1018 21:24:13.406278 1 hypershift.go:107] Updated nodepools ✓ I1018 21:24:13.505868 1 hypershift.go:121] Waiting up to 750s for Hypershift Provisioning job I1018 21:24:13.517140 1 hypershift.go:157] Found HostedCluster status details ✓ panic: interface conversion: interface {} is nil, not map[string]interface {} [recovered] panic: interface conversion: interface {} is nil, not map[string]interface {}goroutine 1 [running]: main.curatorRun.func1() /remote-source/app/cmd/curator/curator.go:150 +0x259 panic({0x183f1a0, 0xc00044ab40}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/stolostron/cluster-curator-controller/pkg/jobs/hypershift.MonitorClusterStatus({0x1ca5520, 0xc0001f0390}, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8}, {0x1a5d802, 0x9}, ...) /remote-source/app/pkg/jobs/hypershift/hypershift.go:164 +0x109a main.curatorRun(0x0?, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8}) /remote-source/app/cmd/curator/curator.go:247 +0x138c main.main() /remote-source/app/cmd/curator/curator.go:66 +0x1c5
Version-Release number of selected component (if applicable):
MCE 2.4.0-DOWNANDBACK-2023-10-15-14-41-45
How reproducible:
100%
Steps to Reproduce:
Steps are described in test case https://polarion.engineering.redhat.com/polarion/redirect/project/OSE/workitem?id=OCP-68375
Actual results:
The post-hook playbook is not triggered after the hostedcluster is installed and the curator-job pod is in Init:Error
Expected results:
The post-hook is triggered after the hostedcluster reaches "Completed" state
Additional info:
Full ClusterCurator YAML:
apiVersion: cluster.open-cluster-management.io/v1beta1 kind: ClusterCurator metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"cluster.open-cluster-management.io/v1beta1","kind":"ClusterCurator","metadata":{"annotations":{},"name":"hosted-1","namespace":"clusters"},"spec":{"desiredCuration":"install","destroy":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"},"install":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"}}} creationTimestamp: "2023-10-18T21:22:51Z" generation: 11 name: hosted-1 namespace: clusters resourceVersion: "1358101" uid: f7c27a7a-3ae0-4a37-b17e-78824b1b7679 spec: destroy: jobMonitorTimeout: 5 posthook: - extra_vars: cluster: hosted-1 stage: posthook type: destroy name: Auto_CLC_Sample_Template type: Job prehook: - extra_vars: cluster: hosted-1 stage: prehook type: destroy name: Auto_CLC_Sample_Template type: Job towerAuthSecret: aap-tower-cred install: jobMonitorTimeout: 5 posthook: - extra_vars: cluster: hosted-1 stage: posthook type: install name: Auto_CLC_Sample_Template type: Job prehook: - extra_vars: cluster: hosted-1 stage: prehook type: install name: Auto_CLC_Sample_Template type: Job towerAuthSecret: aap-tower-cred scale: jobMonitorTimeout: 5 upgrade: monitorTimeout: 120 status: conditions: - lastTransitionTime: "2023-10-18T21:24:13Z" message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion: interface {} is nil, not map[string]interface {}' reason: Job_failed status: "True" type: clustercurator-job - lastTransitionTime: "2023-10-18T21:23:47Z" message: Completed executing init container reason: Job_has_finished status: "True" type: prehook-ansiblejob - lastTransitionTime: "2023-10-18T21:23:46Z" message: prehookjob-xtt5h reason: Job_has_finished status: "True" type: current-ansiblejob - lastTransitionTime: "2023-10-18T21:23:40Z" message: https://acmqe-test-ansible.install.dev09.red-chesterfield.com/#/jobs/playbook/6089 reason: ansiblejob_url status: "True" type: prehookjob-xtt5h - lastTransitionTime: "2023-10-18T21:24:12Z" message: Executing init container activate-and-monitor reason: Job_has_finished status: "False" type: activate-and-monitor