When the user does not set max_xmit_req_size, UNICAST3 and NAKACK2 set it automatically based on the bundle size. That is, the maximum number of messages in a XMIT (retransmission) request is supposed to be the number of sequence numbers that would fit in a single bundle.
The calculation of estimated_max_msgs_in_xmit_req has a mistake: instead of dividing the bundle size by the size of a single sequence number, it does a multiplication. With a bundle size of 8500 (Infinispan default), max_xmit_req_size is set to 67600.
I believe that the default should be fixed to 100, because even the "correct" value, 1056, is too large (never mind 8000, the result with the TP default bundle size of 64k). When more than 1000 messages have been lost, the cluster is almost certainly in a lot of stress, and retransmitting all of them will take a lot of time. It very likely that RetransmitTask will run again in 500ms and will ask the sender to retransmit the same messages multiple times.
This is even worse in Infinispan, because we change the default xmit_interval from 500ms to 100ms. This makes it much more likely for the destination to send overlapping XMIT requests.
- is related to
-
JGRP-2534 Limit the number of threads processing retransmission requests
- Closed
- relates to
-
ISPN-12823 JGroups retransmission requests are too frequent and too large
- Closed