-
Enhancement
-
Resolution: Done
-
Major
-
None
-
None
When we have view {A,B} and B leaves, then the following happens in UNICAST3:
- B sends a LEAVE-REQ to A
- A sends a LEAVE-RSP to B
- Because the LEAVE-RSP is reliable, A keeps sending the LEAVE-RSP to B until it receives an ACK for the LEAVE-RSP
- However, when B receives the LEAVE-RSP, it marks its connection to be acked, which means that when the next retransmission kicks in, an ACK for the LEAVE-RSP will be sent back to A
- Before this happens, B leaves: the ACK is never sent to A
- A keeps resending the LEAVE-RSP until max_retransmit_time (default=60s) elapses and the connection is closed. However, the connection is only closed after another 60s (default for conn_close_timeout).
- Because the reaper removes all entries of logical_addr_cache before that happens, we're seeing the above warnings
SOLUTION:
- Send LEAVE-RSP unreliable. This bypasses UNICAST3 altogether. The leaver won't block forever if the LEAVE-RSP message is dropped, but only for 3 x GMS.leave_timeout ms
- Also add a MBR_LEFT event which is sent up and down the stack by GMS when a member left gracefully. This allows UNICAST3 to close the connection to a given member immediately, stopping unneeded retransmissions to members which left.
- relates to
-
JGRP-1815 TP: sending a message to a non-existent physical address takes too much time
- Resolved