Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60482

Windows nodes lose SSH access after a period of time

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • 4.20.0
    • 4.18
    • Windows Containers
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Important
    • None
    • None
    • None
    • WINC - Sprint 276
    • 1
    • Done
    • Bug Fix
    • Hide
      * Previously, the WMCO neglected to close SSH connections when finishing node reconciliation. As a consequence, after adding a new Windows node to a cluster, the node SSH server eventually would refuse new connections due to being overwhelmed, causing causing node management issues. With this fix, the WMCO now properly closes SSH connections. As a result, the node SSH servers no longer refuse new connections due to this problem. (link:https://issues.redhat.com/browse/OCPBUGS-60482[*OCPBUGS-60482*])
      Show
      * Previously, the WMCO neglected to close SSH connections when finishing node reconciliation. As a consequence, after adding a new Windows node to a cluster, the node SSH server eventually would refuse new connections due to being overwhelmed, causing causing node management issues. With this fix, the WMCO now properly closes SSH connections. As a result, the node SSH servers no longer refuse new connections due to this problem. (link: https://issues.redhat.com/browse/OCPBUGS-60482 [* OCPBUGS-60482 *])
    • None
    • None
    • None
    • None

      Description of problem:

          After a Windows node has been provisioned and successfully added to an OpenShift cluster, the node eventually loses SSH access, typically after several weeks; this seems to eventually lead to certificate errors. At first the node works fine and is able to take on workloads. This issue has specifically been observed on GCP based clusters.

      Version-Release number of selected component (if applicable):

          4.18

      How reproducible:

          Consistently

      Steps to Reproduce:

          1. Create new cluster on GCP and configure a Windows node using the WMCO
          2. Wait roughly a couple of weeks (give or take)
          3. Eventually there are "error instantiating SSH client" messages logged by the WMCO
          

      Actual results:

          Windows nodes eventually lose SSH access, causing node management problems.

      Expected results:

          SSH would work as expected on Windows nodes.

      Additional info:

          

              rh-ee-ssoto Sebastian Soto
              lstanton@redhat.com Luke Stanton
              None
              None
              Aharon Rasouli Aharon Rasouli
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: