Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30774

installer log bundle should gather console logs even when ssh fails

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, if the installation program failed to gather logs from the bootstrap due to an SSH connection issue, it would also not provide virtual machine (VM) serial console logs even if they were collected. With this update, the installation program provides VM serial console logs even if the SSH connection to the bootstrap machine fails. (link:https://issues.redhat.com/browse/OCPBUGS-30774[*OCPBUGS-30774*])
      Show
      * Previously, if the installation program failed to gather logs from the bootstrap due to an SSH connection issue, it would also not provide virtual machine (VM) serial console logs even if they were collected. With this update, the installation program provides VM serial console logs even if the SSH connection to the bootstrap machine fails. (link: https://issues.redhat.com/browse/OCPBUGS-30774 [* OCPBUGS-30774 *])
    • Bug Fix
    • Done

      Description of problem:

          When the installer gathers a log bundle after failure (either automatically or with gather bootstrap), the installer fails to return serial console logs if an SSH connection to the bootstrap node is refused. 
      
      Even if the serial console logs were collected, the installer exits on error if ssh connection is refused:
      
      time="2024-03-09T20:59:26Z" level=info msg="Pulling VM console logs"
      time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"kubernetes.io/cluster/ci-op-4ygffz3q-be93e-jnn92\":\"owned\"}"
      time="2024-03-09T20:59:26Z" level=debug msg="Search for matching instances by tag in us-west-1 matching aws.Filter{\"openshiftClusterID\":\"2f9d8822-46fd-4fcd-9462-90c766c3d158\"}"
      time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-bootstrap" Instance=i-0413f793ffabe9339
      time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0413f793ffabe9339
      time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-0" Instance=i-0ab5f920818366bb8
      time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0ab5f920818366bb8
      time="2024-03-09T20:59:27Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-2" Instance=i-0b93963476818535d
      time="2024-03-09T20:59:27Z" level=debug msg="Download complete" Instance=i-0b93963476818535d
      time="2024-03-09T20:59:28Z" level=debug msg="Attemping to download console logs for ci-op-4ygffz3q-be93e-jnn92-master-1" Instance=i-0797728e092bfbeef
      time="2024-03-09T20:59:28Z" level=debug msg="Download complete" Instance=i-0797728e092bfbeef
      time="2024-03-09T20:59:28Z" level=info msg="Pulling debug logs from the bootstrap machine"
      time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/bootstrap-ssh3643557583 to installer's internal agent"
      time="2024-03-09T20:59:28Z" level=debug msg="Added /tmp/.ssh/ssh-privatekey to installer's internal agent"
      time="2024-03-09T21:01:39Z" level=error msg="Attempted to gather debug logs after installation failure: failed to connect to the bootstrap machine: dial tcp 13.57.212.80:22: connect: connection timed out"
      
      from: https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_api/1788/pull-ci-openshift-api-master-e2e-aws-ovn/1766560949898055680
      
      We can see the console logs were downloaded, they should be saved in the log bundle.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Failed install where SSH to bootstrap node fails. https://github.com/openshift/installer/pull/8137 provides a potential reproducer
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

      Error handling needs to be reworked here: https://github.com/openshift/installer/blob/master/cmd/openshift-install/gather.go#L160-L190    

       

              rdossant Rafael Fonseca dos Santos
              padillon Patrick Dillon
              Yunfei Jiang Yunfei Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: