Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34482

[azure] installer failed as timeout waiting for bootstrap complete, but bootstrap stage is completed

XMLWordPrintable

    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

        On 4.16 recent nightly build, found some jobs failed as installer timeout to wait for bootstrap-complete.
      
      ===============
      time="2024-05-26T13:52:35Z" level=info msg="Waiting up to 30m0s (until 2:22PM UTC) for bootstrapping to complete..."
      time="2024-05-26T14:22:27Z" level=debug msg="Fetching Bootstrap SSH Key Pair..."
      time="2024-05-26T14:22:27Z" level=debug msg="Loading Bootstrap SSH Key Pair..."
      time="2024-05-26T14:22:27Z" level=debug msg="Using Bootstrap SSH Key Pair loaded from state file"
      time="2024-05-26T14:22:27Z" level=debug msg="Reusing previously-fetched Bootstrap SSH Key Pair"
      time="2024-05-26T14:22:27Z" level=debug msg="Fetching Install Config..."
      time="2024-05-26T14:22:27Z" level=debug msg="Loading Install Config..."
      time="2024-05-26T14:22:27Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-26T14:22:27Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-26T14:22:27Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T14:22:27Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-26T14:22:27Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-26T14:22:27Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T14:22:27Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-26T14:22:27Z" level=debug msg="  Loading Platform..."
      time="2024-05-26T14:22:27Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-26T14:22:27Z" level=debug msg="Reusing previously-fetched Install Config"
      time="2024-05-26T14:22:27Z" level=info msg="Pulling Cluster API manifests"
      time="2024-05-26T14:22:27Z" level=info msg="Failed to gather Cluster API manifests: either Cluster API manifests not generated or terraform provision"
      time="2024-05-26T14:22:27Z" level=info msg="Pulling VM console logs"
      time="2024-05-26T14:22:30Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T14:22:30Z" level=debug msg="Attempting to download file from managed account"
      ....
      time="2024-05-26T14:23:37Z" level=error msg="Bootstrap failed to complete: timed out waiting for the condition"
      time="2024-05-26T14:23:37Z" level=error msg="Failed to wait for bootstrapping to complete. This error usually happens when there is a problem with control plane hosts that prevents the control plane operators from creating the control plane."
      time="2024-05-26T14:23:37Z" level=info msg="Bootstrap gather logs captured here \"/tmp/installer/log-bundle-20240526142227.tar.gz\""
      
      But when checking bootkube.log file, bootkube service was finished at 14:16:01, while installer started to gather bootstrap log bundle at 14:22:27.
      ====
      May 26 14:13:49 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Check if API URL is reachable during bootstrap 
      May 26 14:13:49 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Checking if api.ci-op-k09pvbzh-4ce3b.qe.azure.devcluster.openshift.com of type API_URL reachable
      May 26 14:15:57 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Unable to reach API_URL's https endpoint at https://api.ci-op-k09pvbzh-4ce3b.qe.azure.devcluster.openshift.com:6443/version
      May 26 14:15:57 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Unable to validate. https://api.ci-op-k09pvbzh-4ce3b.qe.azure.devcluster.openshift.com:6443/version is currently unreachable.
      May 26 14:15:58 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Check if API-Int URL is reachable during bootstrap
      May 26 14:15:58 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Checking if api-int.ci-op-k09pvbzh-4ce3b.qe.azure.devcluster.openshift.com of type API_INT_URL reachable
      May 26 14:16:01 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: Success while trying to reach API_INT_URL's https endpoint at https://api-int.ci-op-k09pvbzh-4ce3b.qe.azure.devcluster.openshift.com:6443/version
      May 26 14:16:01 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap bootkube.sh[8158]: bootkube.service complete
      May 26 14:16:01 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap systemd[1]: bootkube.service: Deactivated successfully.
      May 26 14:16:01 ci-op-k09pvbzh-4ce3b-hzn5k-bootstrap systemd[1]: bootkube.service: Consumed 5.316s CPU time.
      
      
      Another job's .openshift-install.log, we could see debug message "Bootstrap status: complete" during the installer gathered bootstrap log bundles.
      ====
      time="2024-05-26T04:46:34Z" level=info msg="Waiting up to 20m0s (until 5:06AM UTC) for the Kubernetes API at https://api.ci-op-wkbicr3q-cad5f.qe.azure.devcluster.openshift.com:6443..."
      time="2024-05-26T04:46:34Z" level=debug msg="Loading Agent Config..."
      time="2024-05-26T04:46:49Z" level=info msg="API v1.29.5+87992f4 up"
      time="2024-05-26T04:46:49Z" level=debug msg="Loading Install Config..."
      time="2024-05-26T04:46:49Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-26T04:46:49Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-26T04:46:49Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T04:46:49Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-26T04:46:49Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-26T04:46:49Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T04:46:49Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-26T04:46:49Z" level=debug msg="  Loading Platform..."
      time="2024-05-26T04:46:49Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-26T04:46:49Z" level=info msg="Waiting up to 30m0s (until 5:16AM UTC) for bootstrapping to complete..."
      time="2024-05-26T04:47:19Z" level=debug msg="OpenShift Installer v4.16.0"
      time="2024-05-26T04:47:19Z" level=debug msg="Built from commit 0d8e78ed9fc28a7138e880e14f54f901a75e6414"
      time="2024-05-26T04:47:19Z" level=info msg="Waiting up to 20m0s (until 5:07AM UTC) for the Kubernetes API at https://api.ci-op-wkbicr3q-cad5f.qe.azure.devcluster.openshift.com:6443..."
      time="2024-05-26T04:47:19Z" level=debug msg="Loading Agent Config..."
      time="2024-05-26T04:47:19Z" level=info msg="API v1.29.5+87992f4 up"
      time="2024-05-26T04:47:19Z" level=debug msg="Loading Install Config..."
      time="2024-05-26T04:47:19Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-26T04:47:19Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-26T04:47:19Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T04:47:19Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-26T04:47:19Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-26T04:47:19Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T04:47:19Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-26T04:47:19Z" level=debug msg="  Loading Platform..."
      time="2024-05-26T04:47:19Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-26T04:47:19Z" level=info msg="Waiting up to 30m0s (until 5:17AM UTC) for bootstrapping to complete..."
      time="2024-05-26T05:16:49Z" level=debug msg="Fetching Bootstrap SSH Key Pair..."
      time="2024-05-26T05:16:49Z" level=debug msg="Loading Bootstrap SSH Key Pair..."
      time="2024-05-26T05:16:49Z" level=debug msg="Using Bootstrap SSH Key Pair loaded from state file"
      time="2024-05-26T05:16:49Z" level=debug msg="Reusing previously-fetched Bootstrap SSH Key Pair"
      time="2024-05-26T05:16:49Z" level=debug msg="Fetching Install Config..."
      time="2024-05-26T05:16:49Z" level=debug msg="Loading Install Config..."
      time="2024-05-26T05:16:49Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-26T05:16:49Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-26T05:16:49Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T05:16:49Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-26T05:16:49Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-26T05:16:49Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T05:16:49Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-26T05:16:49Z" level=debug msg="  Loading Platform..."
      time="2024-05-26T05:16:49Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-26T05:16:49Z" level=debug msg="Reusing previously-fetched Install Config"
      time="2024-05-26T05:16:49Z" level=info msg="Pulling Cluster API manifests"
      time="2024-05-26T05:16:49Z" level=info msg="Failed to gather Cluster API manifests: either Cluster API manifests not generated or terraform provision"
      time="2024-05-26T05:16:49Z" level=info msg="Pulling VM console logs"
      time="2024-05-26T05:16:51Z" level=debug msg="Bootstrap status: complete"
      time="2024-05-26T05:16:51Z" level=debug msg="Loading Install Config..."
      time="2024-05-26T05:16:51Z" level=debug msg="  Loading SSH Key..."
      time="2024-05-26T05:16:51Z" level=debug msg="  Loading Base Domain..."
      time="2024-05-26T05:16:51Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T05:16:51Z" level=debug msg="  Loading Cluster Name..."
      time="2024-05-26T05:16:51Z" level=debug msg="    Loading Base Domain..."
      time="2024-05-26T05:16:51Z" level=debug msg="    Loading Platform..."
      time="2024-05-26T05:16:51Z" level=debug msg="  Loading Pull Secret..."
      time="2024-05-26T05:16:51Z" level=debug msg="  Loading Platform..."
      time="2024-05-26T05:16:51Z" level=debug msg="Using Install Config loaded from state file"
      time="2024-05-26T05:16:51Z" level=info msg="It is now safe to remove the bootstrap resources"
      time="2024-05-26T05:16:51Z" level=debug msg="Time elapsed per stage:"
      time="2024-05-26T05:16:51Z" level=debug msg="Bootstrap Complete: 29m31s"
      time="2024-05-26T05:16:51Z" level=info msg="Time elapsed: 29m31s"
      time="2024-05-26T05:16:52Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:52Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:52Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:52Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=debug msg="Attempting to download file from managed account"
      time="2024-05-26T05:16:53Z" level=info msg="Pulling debug logs from the bootstrap machine"
      time="2024-05-26T05:16:53Z" level=debug msg="Added /tmp/bootstrap-ssh4253570103 to installer's internal agent"
      time="2024-05-26T05:16:53Z" level=debug msg="Added /tmp/.ssh/ssh-privatekey to installer's internal agent"
      time="2024-05-26T05:16:53Z" level=info msg="Failed to gather bootstrap logs: failed to create SSH client: failed to use the provided keys for authentication: ssh: handshake failed: ssh: unable to authenticate, attempted methods [none publickey], no supported methods remain"
      ...

       

      Version-Release number of selected component (if applicable):

          recent 4.16 nightly build

      How reproducible:

          sometimes

      Steps to Reproduce:

          1. Install azure ipi cluster against 4.16
          2.
          3.
          

      Actual results:

          Installer timeout during waiting-for bootstrap complete, the time is little short to wait bootstrap complete.

      Expected results:

          Installation is completed

      Additional info:

          This might be related with change in https://issues.redhat.com//browse/OCPBUGS-33610 and https://issues.redhat.com/browse/OCPBUGS-30860

            Unassigned Unassigned
            jinyunma Jinyun Ma
            Jinyun Ma Jinyun Ma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: