-
Bug
-
Resolution: Done
-
Major
-
9.4.21.Final, 11.0.9.Final, 12.0.1.Final
-
None
When a node cannot process new messages, e.g. because of a long GC pause, it will discard the messages and later send a XMIT request for the sender to repeat retransmit the dropped messages.
The retransmission requests sent by UNICAST3 and NAKACK2 are controlled by two attributes: xmit_interval and max_xmit_req_size.
xmit_interval is too small: 100ms, compared to the JGroups default of 500ms.
A reduced xmit_interval is good for retransmitting the last message sooner after a network error, but it is bad when the destination node is the one discarding the messages.
max_xmit_req_size is computed based on bundle size, and the computation yields a huge number: 67600.
Potentially this could lead to XMIT requests so large that they won't fit in a UDP packet. But long before that, it leads to large overlaps between XMIT requests, and the sender having to repeat the same messages over and over.