-
Bug
-
Resolution: Done
-
Major
-
None
-
None
Description of problem:
When provisioning a hive cluster using 4.14 night images, cluster provision failed.
Version-Release number of selected components (if applicable):
registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-05-23-103225
How reproducible:
Always
Steps to Reproduce:
- Provision a hive cluster.
- Waiting cluster install successfully.
Actual results:
Cluster install failed.
[hmx@fedora hive]$ oc get cd NAME INFRAID PLATFORM REGION VERSION CLUSTERTYPE PROVISIONSTATUS POWERSTATE AGE mihuang1072 mihuang1072-7z8fb aws us-east-2 Provisioning 54m [hmx@fedora hive]$ oc get pods NAME READY STATUS RESTARTS AGE mihuang1072-0-68xxf-provision-cl2dh 0/3 Completed 0 3h16m mihuang1072-3-6pmrv-provision-q9l7s 0/3 Completed 0 96m mihuang1072-4-l2lwq-provision-6fjws 0/3 Completed 0 56m mihuang1072-5-kvk2c-provision-x9sgg 1/3 NotReady 0 9m58s
Cluster status:
conditions: - lastProbeTime: "2023-05-26T02:56:06Z" lastTransitionTime: "2023-05-26T02:56:06Z" message: Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane reason: KubeAPIWaitFailed status: "True" type: ProvisionFailed
Provision pod error logs:
time="2023-05-26T02:56:03Z" level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 18.188.153.215:22: connect: connection timed out" time="2023-05-26T02:56:03Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.mihuang1072.qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 3.130.47.186:6443: connect: connection refused" time="2023-05-26T02:56:03Z" level=error msg="Bootstrap failed to complete: Get \"https://api.mihuang1072.qe.devcluster.openshift.com:6443/version\": dial tcp 3.18.54.92:6443: i/o timeout" time="2023-05-26T02:56:03Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane." time="2023-05-26T02:56:04Z" level=error msg="error after waiting for command completion" error="exit status 5" installID=lbz5pcqx time="2023-05-26T02:56:04Z" level=error msg="error provisioning cluster" error="exit status 5" installID=lbz5pcqx time="2023-05-26T02:56:04Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 5" installID=lbz5pcqx
Hive controller error logs:
time="2023-05-26T02:56:06.648Z" level=debug msg="matching search string" clusterProvision=default/mihuang1072-0-68xxf controller=clusterProvision job=mihuang1072-0-68xxf-provision reconcileID=fchb84mf regexName=ProxyTimeout searchString="(?i)error pinging docker registry .+ proxyconnect tcp: dial tcp [^ ]+: connect: no route to host" time="2023-05-26T02:56:06.648Z" level=debug msg="matching search string" clusterProvision=default/mihuang1072-0-68xxf controller=clusterProvision job=mihuang1072-0-68xxf-provision reconcileID=fchb84mf regexName=ProxyInvalidCABundle searchString="(?i)error pinging docker registry .+ proxyconnect tcp: x509: certificate signed by unknown authority"
Expected results:
Cluster install successfully.
Additional info:
hive-operator-5c5ccb9557-dq2wp.log
- is related to
-
OCPBUGS-17154 4.14/AWS: Machines using m4 instance types don't get network
- Closed
- links to