-
Bug
-
Resolution: Done
-
Normal
-
ACM 2.12.0
-
1
-
False
-
None
-
False
-
-
-
Installer Sprint 2024-45, Installer Sprint 2024-46
-
Moderate
-
Yes
Description of problem:
Seems like a intermittent race condition where the clusterroles getting deleted early before the components in submariner is properly cleaned up
Version-Release number of selected component (if applicable):
ACM 2.12
Actual results:
Uninstall is stuck due to the race condition
Expected results:
uninstall should complete
Additional info:
2024-10-22T11:37:44.513Z INFO reconcile Finalizing template {"Kind": "ConfigMap", "Name": "manager-config"} 2024-10-22T11:37:44.518Z INFO reconcile Deleting InternalHubComponent {"Name": "search", "Namespace": "open-cluster-management"} 2024-10-22T11:37:44.566Z INFO reconcile Deleting InternalHubComponent {"Name": "submariner-addon", "Namespace": "open-cluster-management"} 2024-10-22T11:37:44.570Z INFO reconcile Deleting InternalHubComponent {"Name": "volsync", "Namespace": "open-cluster-management"} 2024-10-22T11:37:44.671Z INFO reconcile Clusterroles finalized 2024-10-22T11:37:44.762Z INFO reconcile Clusterrolebindings finalized 2024-10-22T11:37:44.762Z INFO reconcile Deleting MultiClusterEngine resource 2024-10-22T11:37:45.217Z INFO reconcile Finalizing: MCE has not yet been terminated 2024-10-22T11:37:45.245Z INFO reconcile Reconciling MultiClusterHub 2024-10-22T11:37:45.280Z INFO overrides Found overrides from environment variables set by OPERAND_IMAGE_ prefix 2024-10-22T11:37:45.280Z INFO reconcile Overriding Image Repository from annotation: quay.io:443/acm-d 2024-10-22T11:37:45.287Z INFO reconcile All helmreleases have been terminated 2024-10-22T11:37:45.300Z INFO reconcile Finalizing template {"Kind": "Deployment", "Name": "multicluster-operators-channel"} 2024-10-22T11:37:45.381Z INFO reconcile Finalizing template {"Kind": "Deployment", "Name": "multicluster-operators-application"}
The clusterroles should only be deleted after the components are removed properly, so I think somewhere in our code there is probably an incorrect return statement that is not catching properly.
W1022 14:11:27.915221 1 cmd.go:254] Using insecure, self-signed certificates
I1022 14:11:28.219438 1 leaderelection.go:121] The leader election gives 4 retries and allows for 30s of clock skew. The kube-apiserver downtime tolerance is 78s. Worst non-graceful lease acquisition is 2m43s. Worst graceful lease acquisition is {26s}.
I1022 14:11:28.219834 1 observer_polling.go:159] Starting file observer
W1022 14:11:28.228239 1 builder.go:266] unable to get owner reference (falling back to namespace): pods "submariner-addon-6c95b9f8f7-kfdm2" is forbidden: User "system:serviceaccount:open-cluster-management:submariner-addon" cannot get resource "pods" in API group "" in the namespace "open-cluster-management"
I1022 14:11:28.228383 1 builder.go:298] submariner-controller version v0.0.0-unknown--
I1022 14:11:57.281874 1 cmd.go:138] Received SIGTERM or SIGINT signal, shutting down controller.
F1022 14:11:57.281954 1 cmd.go:179] failed checking apiserver connectivity: leases.coordination.k8s.io "submariner-controller-lock" is forbidden: User "system:serviceaccount:open-cluster-management:submariner-addon" cannot get resource "leases" in API group "coordination.k8s.io" in the namespace "open-cluster-management"
Logs from discussing with Disaiah giving this about submariner not having the correct permissions