[JGRP-2277] GMS: change the way a coordinator leaves gracefully - Red Hat Issue Tracker

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 4.0.13
Affects Version/s: None
Labels:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Consider a cluster {A,B,C,D} with view A|5.

There's a discrepancy in handling coordinator A (1) leaving gracefully and (2) crashing.

When A crashes, the second-in-line (B) will install new view B|6={B,C,D} in the cluster.

However, when A leaves gracefully, then A itself installs view B|6={B,C,D}. The problem with this is that A is not able to retransmit the VIEW message to a member which dropped it, so inconsistencies may arise, which have to be healed by MERGE3 (see ~~JGRP-2276~~ for a description).

A better scheme would be for A to send a LEAVE message to the second-in-line (B), which then creates and installs view B|6, and then replies with a LEAVE_RSP message to A.

This has the following advantages:

The code for handling a crashed coordinator, and a coordinator which leaves gracefully, is similar, and in both cases the second-in-line member installs the new view
The second-in-line (B) stays up and can therefore retransmit a dropped VIEW message (contrary to A which terminates after a timeout). As long as A is able to send a LEAVE-REQ to B, B will handle it. If A crashes, B can also handle the view installation.

is related to

JGRP-2276 MERGE3: a dead member as merge leader will never trigger a merge

Resolved

relates to

ISPN-9496 Some xsite tests hang during teardown

Closed

Assignee:: Bela Ban

Reporter:: Bela Ban

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2018/06/13 10:38 AM

Updated:: 2018/09/10 3:40 AM

Resolved:: 2018/07/02 8:44 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates