-
Bug
-
Resolution: Won't Do
-
Major
-
None
-
4.16
-
Moderate
-
No
-
Rejected
-
False
-
There's a class of problems where installation fails after the bootstrap is taken down, but the API server for one reason or another fails and is unavailable preventing must-gather.
The installer has an existing gathering facility that is capable of using the ssh back channel to gather logs, but this is only attempted when the bootstrap is still up. This is because on most cloud platforms, nodes have non-public IP's and aren't accessible.
That isn't the case everywhere though. Specifically metal, but potentially in other situations it's possible to directly SSH to nodes. It would be helpful if in these situations, the installer would try to produce a log bundle directly. This should be relatively easy to do.
Another more complicated option for cloud platforms is to have the installer create a bastion host (effectively a temporary bootstrap) and try to gather node logs that way. In case of post-bootstrap catastrophic failure, and the installer detects the API server is unavailable, maybe it could try a procedure like the one documented in https://docs.openshift.com/container-platform/4.12/networking/accessing-hosts.html.
Slack context https://redhat-internal.slack.com/archives/C68TNFWA2/p1715693074978179
- is related to
-
OCPBUGS-33370 masters.json not written when control plane provisioning fails
- POST