Loading...

XML

Word

Printable

Make the MERGE2/GMS/Merger code more robust and scale better in large clusters.

While a merge is going on, stop sending out discovery requests. This reduces unnecessary traffic, especially in large clusters where discovery responses include the entire view of a sub-cluster
If we start a merge, or receive a MERGE-REQUEST, start a timer which cancels the merge after <merge_timeout *2> milliseconds. This is similar to the MergeKiller code, and prevents stale merges, e.g. by a crashed merge leader
If we have merge participants A,B,C,D,E but A only receives merge responses from itself, B and D, then don't cancel the merge, but instead proceed with merging A, B and D. This is currently not done, but a merge is cancelled when we don't get responses from every participant.

is related to

JGRP-1387 MERGE3: merging in large clusters

relates to

JGRP-100 Large-scale JGroups

JGRP-1377 Merge: add second line of defense for killing of runaway merge tasks