Uploaded image for project: 'Red Hat Data Grid'
  1. Red Hat Data Grid
  2. JDG-6811

SSLSocket blocks on close()

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • RHDG 8.4.x CD
    • None
    • None
    • None
    • False
    • None
    • False

    Description

      During testing cross-site in Keycloak using multiple Gossip Router, killing a single Gossip Router leads to a cluster halt.

      Further investigation reveals threads blocking in Socket.write for a long period (around ~15min - default Kernel timeout) due to the TCP send queue (in Kernel) being full. A synchronization problem revealed that Socket.close was not invoked. See https://issues.redhat.com/browse/JGRP-2746

      After fixing the first issue, a second one arose. The Socket.close was blocked for SSLSockets. This one is a JDK bug: https://bugs.openjdk.org/browse/JDK-8241239

      A workaround exists and it is setting SO_LINGER TCP option. SO_LINGER sets the delay, in seconds, that it waits while data is being transmitted before closing a socket, after a call has been received to close the socket.

      The idea is to enable this option with a 1 or 2 seconds delay. The data in the TCP send queue can be lost, but it will not affect Infinispan. The UNICAST3/NAKACK protocols will ensure message retransmission. (Infinispan JIRA to be created)

      // Operator changes required to set SO_LINGER for TUNNEL (cross-site deployment)

      Attachments

        Issue Links

          Activity

            People

              pruivo@redhat.com Pedro Ruivo
              pruivo@redhat.com Pedro Ruivo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: