-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
- When B suspects C, B multicasts a SUSPECT(C) message
- Everyone receives the SUSPECT(C) message and passes it up and down the stack as a SUSPECT(C) event
- VERIFY_SUSPECT on every member sends one (or more) ARE_YOU_DEAD messages to C
- C replies to the sender with a I_AM_NOT_DEAD messages, or not if crashed
- However, only the coordinator (or next in line) actually processes the SUSPECT(C) event in GMS !
--> All of the VERIFY_SUSPECT processing is superfluous unless it is the coord or next-in-line !
The number of messages used for a false suspicion are (1 SUSPECT mcast) + ((N-1) ARE_YOU_DEAD unicasts) + ((N-1) I_AM_NOT_DEAD unicasts)) !
SOLUTION:
- The SUSPECT(C) message could be sent as a unicast only to the coordinator and the next-in-line member. Maybe we could use a max_rank=2 for this, similar to the suggested solution for FD_ALL ? This would be good for non multicast based transports, e.g. TCP
- The SUSPECT(C) message is multicast to everyone, but only the coord and next-in-line start the VERIFY_SUSPECT processing
Issue: if we have
{A,B,C,D,E}, what happens if A,B and C crash at the same time ?
- E's connection to A closes: E sends a SUSPECT(A) to B and C (excluding suspected A)
--> B and C are dead and won't process the message ! - Then E suspects B and sends a SUSPECT(A,B) to C and D (excluding suspected A and B)
- C adds A and B to its suspect list and finds out it is the next-in-line
- C then runs the VERIFY_SUSPECT protocol
- C passes the SUSPECT(A,B) event up the stack
- C becomes the new coord