-
Enhancement
-
Resolution: Done
-
Major
-
None
-
None
Consider a cluster {A,B,C,D} with view A|5.
There's a discrepancy in handling coordinator A (1) leaving gracefully and (2) crashing.
When A crashes, the second-in-line (B) will install new view B|6={B,C,D} in the cluster.
However, when A leaves gracefully, then A itself installs view B|6={B,C,D}. The problem with this is that A is not able to retransmit the VIEW message to a member which dropped it, so inconsistencies may arise, which have to be healed by MERGE3 (see JGRP-2276 for a description).
A better scheme would be for A to send a LEAVE message to the second-in-line (B), which then creates and installs view B|6, and then replies with a LEAVE_RSP message to A.
This has the following advantages:
- The code for handling a crashed coordinator, and a coordinator which leaves gracefully, is similar, and in both cases the second-in-line member installs the new view
- The second-in-line (B) stays up and can therefore retransmit a dropped VIEW message (contrary to A which terminates after a timeout). As long as A is able to send a LEAVE-REQ to B, B will handle it. If A crashes, B can also handle the view installation.