-
Bug
-
Resolution: Done
-
Blocker
-
None
-
None
-
None
-
False
-
None
-
False
-
-
-
-
-
-
-
During testing cross-site in Keycloak using multiple Gossip Router, killing a single Gossip Router leads to a cluster halt.
Further investigation reveals threads blocking in Socket.write for a long period (around ~15min - default Kernel timeout) due to the TCP send queue (in Kernel) being full. A synchronization problem revealed that Socket.close was not invoked. See https://issues.redhat.com/browse/JGRP-2746
After fixing the first issue, a second one arose. The Socket.close was blocked for SSLSockets. This one is a JDK bug: https://bugs.openjdk.org/browse/JDK-8241239
A workaround exists and it is setting SO_LINGER TCP option. SO_LINGER sets the delay, in seconds, that it waits while data is being transmitted before closing a socket, after a call has been received to close the socket.
The idea is to enable this option with a 1 or 2 seconds delay. The data in the TCP send queue can be lost, but it will not affect Infinispan. The UNICAST3/NAKACK protocols will ensure message retransmission. (Infinispan JIRA to be created)
// Operator changes required to set SO_LINGER for TUNNEL (cross-site deployment)
- is related to
-
JGRP-2753 GossipRouter: heartbeating
- Resolved
-
JGRP-2748 SSLSocket blocks on close()
- Resolved
-
JGRP-2751 TCP with SSLSockets: check behavior on close()
- Resolved
-
JGRP-2752 GossipRouter doesn't use some attributes
- Resolved
-
JDG-6828 [Operator] Configure Gossip Route idle connection timeout
- Verified
-
ISPN-15428 Upgrade JGroups to 5.2.21.Final
- Resolved
-
JGRP-2746 RouterStubManager/RouterStub: remove unneeded synchronization
- Resolved
- relates to
-
ISPN-15501 Add options to configure JGroups bundler and TCP SO_LINGER
- Resolved
- links to