Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1393

Optimization of concurrent joining to a non-existing cluster

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Major Major
    • 2.12.3, 3.0.1, 3.1
    • None
    • None

      When we have no members running yet, and A, B, C and D join a cluster at exactly the same time, the following can happen:

      • A starts, sends a discovery request. B and C reply. A returns after N seconds with responses from A, B and C.
      • B starts, sends a discovery request. A and C reply. B returns after N seconds with responses from A, B and C.
      • C starts, sends a discovery request. A and B reply. C returns after N seconds with responses from A, B and C
      • D starts, sends a discovery request. A, B and C reply. C returns after N seconds with responses from A, B, C and D

      Responses are:
      A: ABC
      B: ABC
      C: ABC
      D: ABCD

      Note that A, B and C don't have D's response.

      The algorithm now has every member sort all of the responses, and pick the first as new coordinator. Say we have the following sorted lists:

      A: BAC
      B: BAC
      C: BAC
      D: DBAC

      The issue is now that B and D will become coordinator, and we have to have a merge to establish the correct cluster membership.

      The reason is that - apparently - A, B and C started a bit (we're talking 1-2 ms) sooner than D, and so D didn't get their discovery requests, and thus didn't send back a discovery response.

      Even though D started a bit after A, B and C, the latter will still receive D's discovery request (but not response). We can now take advantage of this and simply add D's address to the discovery responses of every member when we receive D's discovery request, in addition to D's response.

      This will greatly reduce the chances of a merge having to be done as a result of concurrent startup.

            rhn-engineering-bban Bela Ban
            rhn-engineering-bban Bela Ban
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: