-
Task
-
Resolution: Done
-
Major
-
2.4, 2.5
-
None
#1 Superfluous FLUSH-COMPLETED phase ?
--------------------------------------
It seems we don't need the FLUSH-COMPLETED phase, it is sufficient for
the FLUSH leader to do a START-FLUSH and wait for all FLUSH-OK
messages. Why ? Because the reconciliation protocol will reconcile
messages before the view installation (or state transfer).
Example:
{A,B,C}. E sends message M and then crashes. Only C receives
M.
- A does a START-FLUSH
- B and C send a FLUSH-OK back to A. This means that all multicasts
from B will be seen by A before B's FLUSH-OK is received, and all
multicasts from C will be seen by A before C's FLUSH-OK is received by
A. HOWEVER, C will not necessarily send M to A before sending the
FLUSH-OK message. This is only done through the reconciliation
protocol !
- FLUSH-COMPLETED can be removed, the digest that's part of
FLUSH-COMPLETED has to moved to FKLUSH-OK phase.
- FLUSH-OK is unicast not multicast.
#2 Superfluous STOP-FLUSH-OK phase ?
------------------------------------
Why do we need the acks here ? JGroups guarantees the STOP-FLUSH
message is delivered to all non-faulty members, so why wait fr the
STOP-FLUSH-OK ?
If we make STOP-FLUSH OOB (JGRP-337), then this should work ! We can
get rid of STOP-FLUSH-OK !! Just send STOP-FLUSH as OOB multicast.
Note: this requires the concurrent stack, so possibly check for
concurrent stack. Throw an exception if concurrent stack is not
present when FLUSH is used. So, these changes cannot be applied to
2.4, only to 2.5 and 2.6.
#3 Timeouts
-----------
- ClientGmsImpl: has a hard coded timeout of 4 secs
- CoordGmsImpl also has a timeout for block()
- If we need timeouts, we should get them from the FLUSH protocol !
CoorsGmsImpl.startFlush() should not have a timeout, this should be
in FLUSH itself. startFlush() should block forever, until it gets the ack
#4 Spurious FLUSH-OK in the diagram
-----------------------------------
In the JOIN diagram, probably also in the state transfer diagram
- incorporates
-
JGRP-337 Make STOP_FLUSH phase in FLUSH asynchronous
- Resolved