-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Hypershift agent based installation on Vmware Vsphere OCP cluster causes capi-provider pod crash
Version-Release number of selected component (if applicable):
4.18
How reproducible:
100%
Steps to Reproduce:
Once the hosts get bound to HCP cluster, the nodepool shows incorrect status. The machines do not show any NodeName. The capi-provider pod also crashes. # oc get agents -A NAMESPACE NAME CLUSTER APPROVED ROLE STAGE hcp-agent 465b0642-2976-ee47-26a5-b27ffe8e8208 dpateriy-hcp true worker Done hcp-agent cbb00642-4351-8f7a-b2ec-116afb2e863b dpateriy-hcp true worker Done # oc get machines -A NAMESPACE NAME CLUSTER NODENAME PROVIDERID PHASE AGE VERSION hcp-namespace-dpateriy-hcp dpateriy-hcp-wcjxz dpateriy-hcp-86b8l agent://cbb00642-4351-8f7a-b2ec-116afb2e863b Provisioned 3h54m 4.18.4 hcp-namespace-dpateriy-hcp dpateriy-hcp-wjrc7 dpateriy-hcp-86b8l agent://465b0642-2976-ee47-26a5-b27ffe8e8208 Provisioned 3h54m 4.18.4 # oc logs capi-provider-658b97547f-tc6tf -n hcp-namespace-dpateriy-hcp 2025-04-18T16:59:15Z ERROR Reconciler error {"controller": "agentmachine", "controllerGroup": "capi-provider.agent-install.openshift.io", "controllerKind": "AgentMachine", "AgentMachine": {"name":"dpateriy-hcp-wcjxz","namespace":"hcp-namespace-dpateriy-hcp"}, "namespace": "hcp-namespace-dpateriy-hcp", "name": "dpateriy-hcp-wcjxz", "reconcileID": "b958985c-56aa-453c-a6fb-af2a29b8396d", "error": "failed to find node with name 00-50-56-86-2a-c3", "errorVerbose": "failed to find node with name 00-50-56-86-2a-c3\ngithub.com/openshift/cluster-api-provider-agent/controllers.(*NodeProviderIDReconciler).setNodeProviderID\n\t/remote-source/app/controllers/node_provider_id_controller.go:96\ngithub.com/openshift/cluster-api-provider-agent/controllers.(*NodeProviderIDReconciler).Reconcile\n\t/remote-source/app/controllers/node_provider_id_controller.go:70\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:119\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:316\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227\nruntime.goexit\n\t/usr/lib/golang/src/runtime/asm_amd64.s:1700"}
Actual results:
The HostedCluster agent based deployment is not complete.
Expected results:
The HostedCluster deployment should be successfull, nodepool should show the nodes getting registered. The machines should have nodename in reference.
Additional info:
MCE Must-gather link https://drive.google.com/file/d/1kVJHFBeCfu-F9RxPCwajSsko6NH-y5aJ/view?usp=drive_link hcp-namespace-dpateriy-hcp project inspect report: https://drive.google.com/file/d/1ZlUBs4DHU64hUXHXoS_VW-VA0rYlKyeq/view?usp=sharing