-
Enhancement
-
Resolution: Done
-
Major
-
9.4.18.Final, 10.1.2.Final, 11.0.0.Alpha1
-
None
-
DataGrid Sprint #40
XSite backup commands usually need more processing on the receiving site than local cluster commands do on the receiving node, which means there's a much higher chance of channel.send(message) to block.
UFC, UFC_NB, MFC and MFC_NB all block when there are not enough credits.
The _NB variants have an additional queue as a safety net, but that only delays the blocking: it's the same as increasing max_credits by max_queue_size, except with less work for UNICAST3/NAKACK2.
TCP and UDP also block if their send buffer is full. Using a bundler like transfer-queue instead of the default no-bundler will only delay the blocking until the bundler's queue is also full.
The biggest problem is when xsite backup commands are sent from a jgroups thread, and channel.send(message) blocks the thread. If the jgroups thread pool becomes full, it cannot process more messages, not even responses from the remote site.
JGroups creates temporary threads to process internal messages when its thread pool is full, but not even that can help when the other nodes' thread pools are also full:
"jgroups-temp-thread-5728,_ma267mlvjdg015:dal_mcom_perf" #11443 prio=5 os_prio=0 tid=0x000000000906f800 nid=0x26cb waiting on condition [0x00007fb0b7b0a000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000005f3bce048> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ArrayBlockingQueue.put(ArrayBlockingQueue.java:353) at org.jgroups.protocols.TransferQueueBundler.send(TransferQueueBundler.java:97) at org.jgroups.protocols.TP.send(TP.java:1441) at org.jgroups.protocols.TP._send(TP.java:1195) at org.jgroups.protocols.TP.down(TP.java:1111) ... at org.jgroups.protocols.FlowControl.sendCredit(FlowControl.java:480) at org.jgroups.protocols.FlowControl.handleCreditRequest(FlowControl.java:469) at org.jgroups.protocols.FlowControl.handleUpEvent(FlowControl.java:379) at org.jgroups.protocols.FlowControl.up(FlowControl.java:350)
- is related to
-
JGRP-2452 UFC_NB/MFC_NB: blocks
- Resolved