-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.14
-
None
-
None
-
False
-
Description of problem:
Customer is installing a ocp 4.14 cluster with 3 master and 4 worker nodes using ztp approach on ACM 2.9.5. The bmh hosts gets installed with the ISO but it is not getting added to the cluster and these nodes are stuck in the provisioning state in ACM UI because it cannot communicate with ironic-agent.
omc get bmh -n ocp10 svocp10wrk01.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.183/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk02.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.184/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk03.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.185/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk04.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.186/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk05.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.187/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk06.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.188/redfish/v1/Systems/System.Embedded.1 unknown true 21hsvocp10wrk07.ocp10.pod4ocp.nbnco.lab OK provisioning idrac-virtualmedia://10.0.32.189/redfish/v1/Systems/System.Embedded.1 unknown true 21h$ omc get aci NAME CLUSTER STATE ocp10 ocp10 pending-for-input .. - lastProbeTime: "2024-11-20T03:56:07Z" lastTransitionTime: "2024-11-20T03:56:07Z" message: 'The cluster''s validations are pending for user: Clusters must have exactly 3 dedicated control plane nodes. Add or remove hosts, or change their roles configurations to meet the requirement.,Hosts have not been discovered yet,Hosts have not been discovered yet,Hosts have not been discovered yet,Hosts have not been discovered yet,At least one of the CIDRs (Machine Network, Cluster Network, Service Network) is undefined.,At least one of the CIDRs (Machine Network, Cluster Network, Service Network) is undefined.' reason: ValidationsUserPending status: "False"
On ACM hub Cluster, the metal3-state service is missing 6388 and 5051 ports:
$ oc get services -n openshift-machine-api NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE baremetal-operator-webhook-service ClusterIP 192.168.162.113 <none> 443/TCP 603d cluster-autoscaler-operator ClusterIP 192.168.84.157 <none> 443/TCP,9192/TCP 603d cluster-baremetal-operator-service ClusterIP 192.168.240.36 <none> 8443/TCP 603d cluster-baremetal-webhook-service ClusterIP 192.168.225.165 <none> 443/TCP 603d control-plane-machine-set-operator ClusterIP 192.168.185.83 <none> 9443/TCP 603d machine-api-controllers ClusterIP 192.168.100.22 <none> 8441/TCP,8442/TCP,8444/TCP 603d machine-api-operator ClusterIP 192.168.40.247 <none> 8443/TCP 603d machine-api-operator-webhook ClusterIP 192.168.6.165 <none> 443/TCP 603d metal3-image-customization-service ClusterIP 192.168.131.65 <none> 80/TCP 603d metal3-state ClusterIP 192.168.194.215 <none> 6180/TCP,6183/TCP 603
0200-worker-journal.log
Nov 21 09:31:19 localhost.localdomain podman[9499]: 2024-11-21 09:31:19.538 1 ERROR ironic-python-agent raise ConnectionError(e, request=request) Nov 21 09:31:19 localhost.localdomain podman[9499]: 2024-11-21 09:31:19.538 1 ERROR ironic-python-agent requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.0.9.10', port=5050): Max retries exceeded with url: /v1/continue (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f00f3799af0>: Failed to establish a new connection: [Errno 111] ECONNREFUSED'))
Steps to Reproduce:
1. Install 4.14 cluster using ACM/ZTP sitconfig approach.
Actual results:
Host gets stuck in provisioning state.
Expected results:
Hosts should have been added to the cluster.
Additional info: