Uploaded image for project: 'OpenShift Hive'
  1. OpenShift Hive
  2. HIVE-2232

Failed to provision hive cluster using 4.14 nightly image.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • Low

      Description of problem:

      When provisioning a hive cluster using 4.14 night images, cluster provision failed.

      Version-Release number of selected components (if applicable):
      registry.ci.openshift.org/ocp/release:4.14.0-0.nightly-2023-05-23-103225

      How reproducible:
      Always

      Steps to Reproduce:

      1. Provision a hive cluster.
      2. Waiting cluster install successfully.

      Actual results:

      Cluster install failed.

      [hmx@fedora hive]$ oc get cd
      NAME          INFRAID             PLATFORM   REGION      VERSION   CLUSTERTYPE   PROVISIONSTATUS   POWERSTATE   AGE
      mihuang1072   mihuang1072-7z8fb   aws        us-east-2                           Provisioning                   54m
      [hmx@fedora hive]$ oc get pods
      NAME                                  READY   STATUS      RESTARTS   AGE
      mihuang1072-0-68xxf-provision-cl2dh   0/3     Completed   0          3h16m
      mihuang1072-3-6pmrv-provision-q9l7s   0/3     Completed   0          96m
      mihuang1072-4-l2lwq-provision-6fjws   0/3     Completed   0          56m
      mihuang1072-5-kvk2c-provision-x9sgg   1/3     NotReady    0          9m58s
       

      Cluster status:

        conditions:
        - lastProbeTime: "2023-05-26T02:56:06Z"
          lastTransitionTime: "2023-05-26T02:56:06Z"
          message: Failed waiting for Kubernetes API. This error usually happens when there
            is a problem on the bootstrap host that prevents creating a temporary control
            plane
          reason: KubeAPIWaitFailed
          status: "True"
          type: ProvisionFailed
      

      Provision pod error logs:

      time="2023-05-26T02:56:03Z" level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 18.188.153.215:22: connect: connection timed out"
      time="2023-05-26T02:56:03Z" level=error msg="Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get \"https://api.mihuang1072.qe.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusteroperators\": dial tcp 3.130.47.186:6443: connect: connection refused"
      time="2023-05-26T02:56:03Z" level=error msg="Bootstrap failed to complete: Get \"https://api.mihuang1072.qe.devcluster.openshift.com:6443/version\": dial tcp 3.18.54.92:6443: i/o timeout"
      time="2023-05-26T02:56:03Z" level=error msg="Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane."
      time="2023-05-26T02:56:04Z" level=error msg="error after waiting for command completion" error="exit status 5" installID=lbz5pcqx
      time="2023-05-26T02:56:04Z" level=error msg="error provisioning cluster" error="exit status 5" installID=lbz5pcqx
      time="2023-05-26T02:56:04Z" level=error msg="error running openshift-install, running deprovision to clean up" error="exit status 5" installID=lbz5pcqx
      

      Hive controller error logs:

      time="2023-05-26T02:56:06.648Z" level=debug msg="matching search string" clusterProvision=default/mihuang1072-0-68xxf controller=clusterProvision job=mihuang1072-0-68xxf-provision reconcileID=fchb84mf regexName=ProxyTimeout searchString="(?i)error pinging docker registry .+ proxyconnect tcp: dial tcp [^ ]+: connect: no route to host"
      time="2023-05-26T02:56:06.648Z" level=debug msg="matching search string" clusterProvision=default/mihuang1072-0-68xxf controller=clusterProvision job=mihuang1072-0-68xxf-provision reconcileID=fchb84mf regexName=ProxyInvalidCABundle searchString="(?i)error pinging docker registry .+ proxyconnect tcp: x509: certificate signed by unknown authority"
      

      Expected results:

      Cluster install successfully.

      Additional info:

      provisionfailed.log
      error.log

      x509error.log

      hive-operator-5c5ccb9557-dq2wp.log

      mihuang1072-0-68xxf-provision-cl2dh.log

      hive-controllers-548d6b9845-6vlx9.log

        1. error.log
          268 kB
        2. hive-controllers-548d6b9845-6vlx9.log
          6.87 MB
        3. hive-operator-5c5ccb9557-dq2wp.log
          36.30 MB
        4. m5provisionlogs
          531 kB
        5. mihuang1072-0-68xxf-provision-cl2dh.log
          477 kB
        6. provisionfailed.log
          477 kB
        7. x509error.log
          8 kB

            sumehta Suhani Mehta
            mihuang@redhat.com Mingxia Huang
            Mingxia Huang Mingxia Huang
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: