Loading...

XML

Word

Printable

Details

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 4.0.13
Affects Version/s: None
Labels:
None

SFDC Cases Counter:
SFDC Cases Links:

Description

When MERGE3 uses TCP, it cannot multicast its INFO message, and therefore uses the discovery protocol (e.g. MPING) to fetch the targets to send the INFO message to.

Since we don't know how many responses to expect, we simply block for (min_interval + max_interval /2) ms. This is bad, as it delays the sending of INFO messages, which results in a partial merge as we're likely not to get responses from all members. This delays a full merge, e.g. when we have many singleton subclusters. A heavily split cluster will therefore likely require more merge rounds than necessary when using TCP, compared to (e.g.) UDP.

Solution:

The discovery process should be reactive rather than blocking: instead of waiting for N seconds, we simply pass a function to the discovery protocol that gets invoked whenever a response has been received
When the function gets invoked, it sends an INFO to the respective member
This prevents 1 thread from blocking for N seconds

See [1] for details.
[1] https://github.com/belaban/JGroups/pull/389

Attachments

Issue Links

blocks

WFLY-9420 Cluster partitions can take up to cca 2 minutes to merge with TCP stack

Closed

Activity

People

Assignee:: Bela Ban

Reporter:: Bela Ban

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2018/06/26 8:50 AM

Updated:: 2020/09/14 5:03 AM

Resolved:: 2018/06/27 5:09 AM