Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2748

SSLSocket blocks on close()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 5.3.2, 5.2.21
    • None
    • None
    • False
    • None
    • False

      We have a client and server, both using TSL (SSLSocket / SSLServerSocket) - JDK 11.

      • The clients is sending a lot of data
      • The server is killed without initiating the FIN|ACK sequence: this can happen e.g. if an intermediate switch crashes, iptables discards input, a power/ethernet plug is pulled or a pod is killed with --force.
      • The client keeps sending until it blocks on the TCP write because the send window is full:
          java.lang.Thread.State: RUNNABLE
        	  at java.net.SocketOutputStream.socketWrite0(SocketOutputStream.java:-1)
        	  at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:110)
        	  at java.net.SocketOutputStream.write(SocketOutputStream.java:150)
        	  at sun.security.ssl.SSLSocketOutputRecord.deliver(SSLSocketOutputRecord.java:345)
        	  at sun.security.ssl.SSLSocketImpl$AppOutputStream.write(SSLSocketImpl.java:1309)
        	  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:123)
        
      • Method deliver acquires SSLSocketOutputRecord.recordLock, but then blocks on the TCP write
      • After heartbeat_timeout milliseconds, the client closes the connection to the server
      • However, it fails because recordLock is held by the write:
        	  at jdk.internal.misc.Unsafe.park(Unsafe.java:-1)
        	  at java.util.concurrent.locks.LockSupport.park(LockSupport.java:194)
        	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885)
        	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917)
        	  at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240)
        	  at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:267)
        	  at sun.security.ssl.SSLSocketImpl.closeNotify(SSLSocketImpl.java:736)
        	  at sun.security.ssl.SSLSocketImpl.duplexCloseOutput(SSLSocketImpl.java:662)
        	  at sun.security.ssl.SSLSocketImpl.close(SSLSocketImpl.java:585)
        
      • This is caused by [1]
      • The client will block trying to close the connection until TCP's retransmission gives up and closes the connection (15 minutes on Linux)

      Solution

      The code in SSLSocket.closeNotify() hangs trying to acquire recordLock (code abridged):

          void closeNotify(boolean useUserCanceled) throws IOException {
              int linger = getSoLinger();
              if (linger >= 0) {
                  boolean interrupted = Thread.interrupted();
                  try {
                      if (conContext.outputRecord.recordLock.tryLock() ||
                              conContext.outputRecord.recordLock.tryLock(
                                      linger, TimeUnit.SECONDS)) {
                          try {
                              handleClosedNotifyAlert(useUserCanceled);
                          } finally {
                              conContext.outputRecord.recordLock.unlock();
                          }
                      } else {
                          if (!super.isOutputShutdown()) {
                              if (isLayered() && !autoClose) {
                                  throw new SSLException(
                                          "SO_LINGER timeout, " +
                                          "close_notify message cannot be sent.");
                              } else {
                                  super.shutdownOutput();
                              }
                          }
                          conContext.conSession.invalidate();
                      }
                  } catch (InterruptedException ex) {
                      interrupted = true;
                  }
                  if (interrupted) {
                      Thread.currentThread().interrupt();
                  }
              } else {
                  conContext.outputRecord.recordLock.lock();
                  try {
                      handleClosedNotifyAlert(useUserCanceled);
                  } finally {
                      conContext.outputRecord.recordLock.unlock();
                  }
              }
          }
      

      To prevent going into the else branch which acquires the lock, we set the socket's linger time to 0: this means that tryLock() and tryLock(timeout) fail immediately (after 0 seconds) and the socket is released.

      So in TcpConnection, SO_LINGER is set to 0 if the socket is an SSLSocket. Kind of a kludge but shy from implementing SSLEngine in TCP_NIO2, the best solution for now...

      [1] https://bugs.openjdk.org/browse/JDK-8241239

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: