-
Task
-
Resolution: Done
-
Major
-
None
-
None
Make the MERGE2/GMS/Merger code more robust and scale better in large clusters.
- While a merge is going on, stop sending out discovery requests. This reduces unnecessary traffic, especially in large clusters where discovery responses include the entire view of a sub-cluster
- If we start a merge, or receive a MERGE-REQUEST, start a timer which cancels the merge after <merge_timeout *2> milliseconds. This is similar to the MergeKiller code, and prevents stale merges, e.g. by a crashed merge leader
- If we have merge participants A,B,C,D,E but A only receives merge responses from itself, B and D, then don't cancel the merge, but instead proceed with merging A, B and D. This is currently not done, but a merge is cancelled when we don't get responses from every participant.