When a member receives a message, it adds the message's length to a byte count; when the count exceeds max_bytes, a STABLE message will be sent to the coordinator.
However, when DONT_LOOPBACK is set in a message M, M will not be passed up the stack, therefore max_bytes will not get exceeded.
In a cluster of {A,B,C,D,E}, every member will receive messages from everybody else, and therefore increment the byte count, but not from itself. If, for example, C sends large messages, and everybody else sends small (or no) messages, then the coordinator (A) will get STABLE messages from itself, B, D and E. This will not trigger the sending of a STABILITY message, causing the large messages from C to remain in memory until STABLE.desired_avg_gossip kicks in, making C send a STABLE message to A.
Solution: in down(Message m), increment the byte count if DONT_LOOPBACK is set (we know we won't receive it ourself): if max_bytes is exceeded, send a STABLE message to the coordinator.
Unit test (STABLE_Test):
- {A,B,C}, disable desired_avg_gossip
- B sends N large messages with DONT_LOOPBACK, which should trigger multiple STABLE messages to A (but doesn't)
- Assert that a STABILITY message has been received, purging all delivered messages (this will fail until
JGRP-2605has been resolved)