-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.22.0
-
None
-
None
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
When installing an OCP cluster, we currently cannot SSH into the bootstrap node at all. If attempting to, we see the following error: "kex_exchange_identification: read: Connection reset by peer Connection reset by 51.224.118.32 port 22."
This is the same issue encountered when trying to gather bootstrap logs (via SSH). If trying to, we see a similar error: msg="Failed to gather bootstrap logs: failed to create SSH client: ssh: handshake failed: read tcp 172.24.247.41:35178->20.57.194.6:22: read: connection reset by peer". The log bundle is empty.
Version-Release number of selected component (if applicable):
4.22.0 (AWS, Azure, maybe more platforms)
How reproducible:
Always
Steps to Reproduce:
1. Use the openshift-installer in any of 4.22 nightly payload
2. Install the cluster and wait till "waiting for bootstrap to complete" phase
3. Try to SSH into the bootstrap node using its public IP/DNS or try openshift-install gather bootstrap
4. Observe the connection reset problem
Actual results:
Error with connection reset by peer
Expected results:
SSH should succeed
Additional info:
We do observe the issue in installer presubmit jobs and blocking jobs of release payload. See below examples for the install log (with the error) + empty log bundle.
- periodic-ci-ope[…]-ovn-techpreview/ipi-install-install/artifacts
- periodic-ci-ope[…]e-aws-ovn-serial/ipi-install-install/artifacts
- pr-logs/pull/openshift_installer/10335/pull-ci-openshift-installer-main-e2e-azure-default-config/2026359867026968576/artifacts/e2e-azure-default-config/ipi-install-install/artifacts
Attempting to debug it, I noticed that if the bootstrap node is rebooted (i.e. essentially restarted the sshd process), SSH works just fine. The journal log for sshd since last boot shows that:
Feb 24 18:19:34 i-0128b33b58166aa6a sshd[11661]: -R not supported here
This probably causes SSH server to degrade, but I am clueless why it's happening now...
Note: This problem does not cause the install to fail. However, it broke the ability to gather bootstrap logs or ssh into bootstrap node for debugging.