Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33647

Installer gather should try to collect a log bundle even after bootstrap is gone

XMLWordPrintable

    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      There's a class of problems where installation fails after the bootstrap is taken down, but the API server for one reason or another fails and is unavailable preventing must-gather.

      The installer has an existing gathering facility that is capable of using the ssh back channel to gather logs, but this is only attempted when the bootstrap is still up. This is because on most cloud platforms, nodes have non-public IP's and aren't accessible.

      That isn't the case everywhere though. Specifically metal, but potentially in other situations it's possible to directly SSH to nodes. It would be helpful if in these situations, the installer would try to produce a log bundle directly. This should be relatively easy to do.

      Another more complicated option for cloud platforms is to have the installer create a bastion host (effectively a temporary bootstrap) and try to gather node logs that way. In case of post-bootstrap catastrophic failure, and the installer detects the API server is unavailable, maybe it could try a procedure like the one documented in https://docs.openshift.com/container-platform/4.12/networking/accessing-hosts.html.

      Slack context https://redhat-internal.slack.com/archives/C68TNFWA2/p1715693074978179

              padillon Patrick Dillon
              stbenjam Stephen Benjamin
              Gaoyun Pei Gaoyun Pei
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: