Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1814

No physical address for X; dropping message

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 3.5
    • None
    • None

      When we have view {A,B} and B leaves, then the following happens in UNICAST3:

      • B sends a LEAVE-REQ to A
      • A sends a LEAVE-RSP to B
      • Because the LEAVE-RSP is reliable, A keeps sending the LEAVE-RSP to B until it receives an ACK for the LEAVE-RSP
      • However, when B receives the LEAVE-RSP, it marks its connection to be acked, which means that when the next retransmission kicks in, an ACK for the LEAVE-RSP will be sent back to A
      • Before this happens, B leaves: the ACK is never sent to A
      • A keeps resending the LEAVE-RSP until max_retransmit_time (default=60s) elapses and the connection is closed. However, the connection is only closed after another 60s (default for conn_close_timeout).
      • Because the reaper removes all entries of logical_addr_cache before that happens, we're seeing the above warnings

      SOLUTION:

      1. Send LEAVE-RSP unreliable. This bypasses UNICAST3 altogether. The leaver won't block forever if the LEAVE-RSP message is dropped, but only for 3 x GMS.leave_timeout ms
      2. Also add a MBR_LEFT event which is sent up and down the stack by GMS when a member left gracefully. This allows UNICAST3 to close the connection to a given member immediately, stopping unneeded retransmissions to members which left.

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: