Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-37540

[CAPI Azure] Failed to connect to bootstrap machine when gathering bootstrap log

XMLWordPrintable

    • Important
    • None
    • Proposed
    • False
    • Hide

      None

      Show
      None
    • N/A
    • Release Note Not Required
    • Done

      Description of problem:

      created Azure IPI cluster by using CAPI, interrupted the installer when running at the stage of waiting for bootstrapping to complete, then ran command "openshift-installer gather bootstrap --dir <install_dir>" to gather bootstrap log.
      
      $ ./openshift-install gather bootstrap --dir ipi --log-level debug
      DEBUG OpenShift Installer 4.17.0-0.test-2024-07-25-014817-ci-ln-rcc2djt-latest 
      DEBUG Built from commit 91618bc6507416492d685c11540efb9ae9a0ec2e 
      ...
      DEBUG Looking for machine manifests in ipi/.clusterapi_output 
      DEBUG bootstrap manifests found: [ipi/.clusterapi_output/Machine-openshift-cluster-api-guests-jima25-m-4sq6j-bootstrap.yaml] 
      DEBUG found bootstrap address: 10.0.0.7            
      DEBUG master machine manifests found: [ipi/.clusterapi_output/Machine-openshift-cluster-api-guests-jima25-m-4sq6j-master-0.yaml ipi/.clusterapi_output/Machine-openshift-cluster-api-guests-jima25-m-4sq6j-master-1.yaml ipi/.clusterapi_output/Machine-openshift-cluster-api-guests-jima25-m-4sq6j-master-2.yaml] 
      DEBUG found master address: 10.0.0.4               
      DEBUG found master address: 10.0.0.5               
      DEBUG found master address: 10.0.0.6               
      ...
      DEBUG Added /home/fedora/.ssh/openshift-qe.pem to installer's internal agent 
      DEBUG Added /home/fedora/.ssh/id_rsa to installer's internal agent 
      DEBUG Added /home/fedora/.ssh/openshift-dev.pem to installer's internal agent 
      DEBUG Added /tmp/bootstrap-ssh2769549403 to installer's internal agent 
      INFO Failed to gather bootstrap logs: failed to connect to the bootstrap machine: dial tcp 10.0.0.7:22: connect: connection timed out 
      ...
      
      Checked Machine-openshift-cluster-api-guests-jima25-m-4sq6j-bootstrap.yaml under capi artifact folder, only private IP is there.
      $ yq-go r Machine-openshift-cluster-api-guests-jima25-m-4sq6j-bootstrap.yaml status.addresses
      - type: InternalDNS
        address: jima25-m-4sq6j-bootstrap
      - type: InternalIP
        address: 10.0.0.7
      
      From https://github.com/openshift/installer/pull/8669/, it creates an inbound nat rule that forwards port 22 on the public load balancer to the bootstrap host instead of creating public IP directly for bootstrap, and I tried and it was succeeded to ssh login bootstrap server by using frontend IP of public load balancer. But as no public IP saved in bootstrap machine CAPI artifact, installer failed to connect bootstrap machine with private ip.  

      Version-Release number of selected component (if applicable):

       4.17 nightly build   

      How reproducible:

       Always

      Steps to Reproduce:

          1. Create Azure IPI cluster by using CAPI
          2. Interrupt installer when waiting for bootstrap complete
          3. gather bootstrap logs
          

      Actual results:

          Only serial console logs and local capi artifacts are collected, logs on bootstrap and control plane fails to be collected due to ssh connection to bootstrap timeout.

      Expected results:

          succeed to gather bootstrap logs

      Additional info:

          

            padillon Patrick Dillon
            jinyunma Jinyun Ma
            Jinyun Ma Jinyun Ma
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: