-
Bug
-
Resolution: Done
-
Critical
-
ACM 2.9.0
-
None
-
False
-
-
False
-
-
-
Important
-
No
Description of problem:
Installing an agent-based hosted cluster with ClusterCurator. The post-hook for the install stage is not triggered on Ansible Tower after the install stage completes.
In ClusterCurator CR status there is the following condition:
- lastTransitionTime: "2023-10-18T21:24:13Z" message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion: interface {} is nil, not map[string]interface {}' reason: Job_failed status: "True" type: clustercurator-job
The curator-job pod is in status "Init:Error":
$ oc get po -n clusters NAME READY STATUS RESTARTS AGE curator-job-lfbtd-x4r6d 0/1 Init:Error 0 8m51s prehookjob-xtt5h-gmg79 0/1 Completed 0 8m12s
And the activate-and-monitor container inside the curator-job-lfbtd-x4r6d pod is showing the following logs:
$ oc logs -n clusters curator-job-lfbtd-x4r6d -c activate-and-monitor I1018 21:24:10.506372 1 curator.go:86] Mode: activate-and-monitor Cluster I1018 21:24:11.506007 1 curator.go:111] Found clusterCurator resource "clusters" ✓ I1018 21:24:12.406396 1 hypershift.go:34] * Initiate Hypershift Provisioning I1018 21:24:12.705437 1 hypershift.go:65] Looking up hostedclusters hosted-1 namespace clusters I1018 21:24:12.906785 1 hypershift.go:80] Found hostedclusters hosted-1 in namespace clusters ✓ I1018 21:24:12.906879 1 hypershift.go:100] Patching hostedclusters hosted-1 in namespace clusters ✓ I1018 21:24:12.925932 1 hypershift.go:107] Updated hostedclusters ✓ I1018 21:24:13.206791 1 hypershift.go:65] Looking up nodepools hosted-1 namespace clusters I1018 21:24:13.307978 1 hypershift.go:80] Found nodepools hosted-1 in namespace clusters ✓ I1018 21:24:13.308014 1 hypershift.go:100] Patching nodepools hosted-1 in namespace clusters ✓ I1018 21:24:13.406278 1 hypershift.go:107] Updated nodepools ✓ I1018 21:24:13.505868 1 hypershift.go:121] Waiting up to 750s for Hypershift Provisioning job I1018 21:24:13.517140 1 hypershift.go:157] Found HostedCluster status details ✓ panic: interface conversion: interface {} is nil, not map[string]interface {} [recovered] panic: interface conversion: interface {} is nil, not map[string]interface {}goroutine 1 [running]: main.curatorRun.func1() /remote-source/app/cmd/curator/curator.go:150 +0x259 panic({0x183f1a0, 0xc00044ab40}) /usr/lib/golang/src/runtime/panic.go:884 +0x213 github.com/stolostron/cluster-curator-controller/pkg/jobs/hypershift.MonitorClusterStatus({0x1ca5520, 0xc0001f0390}, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8}, {0x1a5d802, 0x9}, ...) /remote-source/app/pkg/jobs/hypershift/hypershift.go:164 +0x109a main.curatorRun(0x0?, {0x1cc2a60, 0xc0004ce5a0}, {0x7fff5b7aae21, 0x8}, {0xc000216098, 0x8}) /remote-source/app/cmd/curator/curator.go:247 +0x138c main.main() /remote-source/app/cmd/curator/curator.go:66 +0x1c5
Version-Release number of selected component (if applicable):
MCE 2.4.0-DOWNANDBACK-2023-10-15-14-41-45
How reproducible:
100%
Steps to Reproduce:
Steps are described in test case https://polarion.engineering.redhat.com/polarion/redirect/project/OSE/workitem?id=OCP-68375
Actual results:
The post-hook playbook is not triggered after the hostedcluster is installed and the curator-job pod is in Init:Error
Expected results:
The post-hook is triggered after the hostedcluster reaches "Completed" state
Additional info:
Full ClusterCurator YAML:
apiVersion: cluster.open-cluster-management.io/v1beta1
kind: ClusterCurator
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"cluster.open-cluster-management.io/v1beta1","kind":"ClusterCurator","metadata":{"annotations":{},"name":"hosted-1","namespace":"clusters"},"spec":{"desiredCuration":"install","destroy":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"destroy"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"},"install":{"jobMonitorTimeout":5,"posthook":[{"extra_vars":{"cluster":"hosted-1","stage":"posthook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"prehook":[{"extra_vars":{"cluster":"hosted-1","stage":"prehook","type":"install"},"name":"Auto_CLC_Sample_Template","type":"Job"}],"towerAuthSecret":"aap-tower-cred"}}}
creationTimestamp: "2023-10-18T21:22:51Z"
generation: 11
name: hosted-1
namespace: clusters
resourceVersion: "1358101"
uid: f7c27a7a-3ae0-4a37-b17e-78824b1b7679
spec:
destroy:
jobMonitorTimeout: 5
posthook:
- extra_vars:
cluster: hosted-1
stage: posthook
type: destroy
name: Auto_CLC_Sample_Template
type: Job
prehook:
- extra_vars:
cluster: hosted-1
stage: prehook
type: destroy
name: Auto_CLC_Sample_Template
type: Job
towerAuthSecret: aap-tower-cred
install:
jobMonitorTimeout: 5
posthook:
- extra_vars:
cluster: hosted-1
stage: posthook
type: install
name: Auto_CLC_Sample_Template
type: Job
prehook:
- extra_vars:
cluster: hosted-1
stage: prehook
type: install
name: Auto_CLC_Sample_Template
type: Job
towerAuthSecret: aap-tower-cred
scale:
jobMonitorTimeout: 5
upgrade:
monitorTimeout: 120
status:
conditions:
- lastTransitionTime: "2023-10-18T21:24:13Z"
message: 'curator-job-lfbtd DesiredCuration: install Failed - interface conversion:
interface {} is nil, not map[string]interface {}'
reason: Job_failed
status: "True"
type: clustercurator-job
- lastTransitionTime: "2023-10-18T21:23:47Z"
message: Completed executing init container
reason: Job_has_finished
status: "True"
type: prehook-ansiblejob
- lastTransitionTime: "2023-10-18T21:23:46Z"
message: prehookjob-xtt5h
reason: Job_has_finished
status: "True"
type: current-ansiblejob
- lastTransitionTime: "2023-10-18T21:23:40Z"
message: https://acmqe-test-ansible.install.dev09.red-chesterfield.com/#/jobs/playbook/6089
reason: ansiblejob_url
status: "True"
type: prehookjob-xtt5h
- lastTransitionTime: "2023-10-18T21:24:12Z"
message: Executing init container activate-and-monitor
reason: Job_has_finished
status: "False"
type: activate-and-monitor