-
Bug
-
Resolution: Done
-
Critical
-
JDG 7.2.3 GA
-
None
When a peer is non-responsive (without closing its socket), a TcpConnection.send() can block on a write (state is RUNNABLE!).
The problem is that the TcpConnection cannout be closed either, as TcpConnection.close() tries to acquire the same lock already held by TcpConnection.send().
See the stack trace below for a sample scenario.
The use case is this one:
- Say we have nodes A (coord), B and C
- There's heavy (clustering) traffic to all 3 nodes, from the 2 clients
- B is isolated by executing 'ifdown bond0'
- At this point, the messages going to B will back up at (say) A because A doesn't get any TCP acks from B
- At some point, depending on the traffic and the size of the sent messages, A will acquire a lock on the send connection to B, to write data, but the write will block as the TCP send-window to B is full (note that the sender thread will still be in state RUNNABLE!)
- After 40s, A suspects B and emits a new view {A,C}
- This causes A's connection to B to be closed and subsequently removed. However, this won't happen, as the connection close will need to acquire the connection lock, which is held by the TCP write
"main" #1 prio=5 os_prio=31 tid=0x00007fbbd3802000 nid=0x2303 runnable [0x0000700009793000] java.lang.Thread.State: RUNNABLE at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111) at java.net.SocketOutputStream.write(SocketOutputStream.java:155) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:122) - locked <0x000000079e790a50> (a java.io.BufferedOutputStream) at java.io.DataOutputStream.write(DataOutputStream.java:107) - locked <0x000000079e790838> (a java.io.DataOutputStream) at org.jgroups.blocks.cs.TcpConnection.doSend(TcpConnection.java:161) at org.jgroups.blocks.cs.TcpConnection.send(TcpConnection.java:131) at org.jgroups.blocks.cs.TcpClient.send(TcpClient.java:103) at org.jgroups.tests.bla6.main(bla6.java:35) "Thread-2" #15 prio=5 os_prio=31 tid=0x00007fbbd2150800 nid=0x6503 waiting on condition [0x000070000bcf6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x000000079e7871a8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285) at org.jgroups.blocks.cs.TcpConnection.close(TcpConnection.java:358) at org.jgroups.util.Util.close(Util.java:422) at org.jgroups.blocks.cs.TcpClient.stop(TcpClient.java:85) at org.jgroups.blocks.cs.BaseServer.close(BaseServer.java:147) at org.jgroups.util.Util.close(Util.java:422) at org.jgroups.tests.bla6.lambda$main$0(bla6.java:27) at org.jgroups.tests.bla6$$Lambda$1/1384010761.run(Unknown Source) at java.lang.Thread.run(Thread.java:748)
- clones
-
JGRP-2350 TCP: connection close can block when send() block on full TCP send-window
- Resolved
- relates to
-
ISPN-10295 Upgrade to JGroups 4.0.20.Final
- Closed
- links to