1. Set up a cluster of four nodes, two on one machine (Host 1) and two on another (Host 2). Let's call the nodes A, B, C, and D.
2. Configure all 4 nodes with S3_PING as the discovery mechanism. Set remove_all_files_on_view_change to true.
3. Start up nodes in the order A, B, C, D.
4. In the S3 bucket, there should be a single file with all four nodes listed. Node A should be flagged as the coordinator. Ensure that the UUID for node B is larger than the UUID for node C, when compared as two's complement integers. If this is not the case, shut down all nodes and restart in order. Repeat until the desired relationship is achieved. Note that with two's complement, a UUID having a first hex digit of 8 or higher is treated as negative for comparison purposes. So, for example, a UUID starting with 'a' is less than a UUID starting with 'b' which is less than a UUID starting with '1'.
5. On Host 1, use iptables to block all traffic going to and coming from Host 2.
sudo iptables -A INPUT -s <Host 2 IP addr> -j DROP
sudo iptables -A OUTPUT -d <Host 2 IP addr> -j DROP
6. Allow a few minutes for the nodes to detect the network partition. Eventually you should see two files in the S3 bucket.
7. Using Ctrl-C, stop node A.
8. You should soon find only a single file in the bucket, containing a single entry for node B. This is a result of the remove_all_files_on_view_change setting on S3_PING, which we set to true to avoid accumulation of old files in the bucket.
9. Resolve the network partition:
sudo iptables -F OUTPUT
sudo iptables -F INPUT
10. You will find that, even after many minutes, the subclusters are not merged.
I believe the reason why the subclusters are never merged is as follows:
- MERGE3 on nodes B, C and D uses S3_PING to find members to send INFO messages to. Each one finds only node B in the discovery file. As a result, only node B's view consistency checker has anything to work with.
- On node B, the consistency checker can see that there are two coordinators, B and C. However, node C has a lower UUID, so node B defers to it to perform the merge. Node C never performs the merge because, as mentioned above, it is not receiving any INFO messages.
I this this problem would affect FILE_PING as well, and other protocols derived from FILE_PING. Looking at the latest 4.x code, it appears the problem still exists there.
I think the crux of the issue is that the coordinator on Host 2 (node C) does not re-create its discovery file after it is deleted by node B. Would it be reasonable for FILE_PING.findMembers() to create the discovery file if the node is a coordinator and the file doesn't exist?