Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: 2.4
Affects Version/s: 2.3 SP1
Labels:
None

Estimated Difficulty:
High

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

2 use cases where we can run into the problem (members A and B).

#1 View change

A is running, B joins
B is not blocking in FLUSH, A is blocking after START_FLUSH
A starts the flush
A returns the new view to B in a JOIN_RSP. This causes B's Channel.connect() to return
B sends a unicast message to A, to which A sends a response in the same thread (service STATE_REQ)
A competes the flush, multicasting a STOP_FLUSH message
The STATE_REQ at A hangs on FLUSH.down()
The STOP_FLUSH at A can never unblock FLUSH.down() because it was received after the STATE_REQ from B !

SOLUTION:

1. Make B block in FLUSH.down() as soon as the client sends the JOIN_REQ to A
2. Make STOP_FLUSH synchronous. This means we only return from Channel.connect() (for example) once every member has ack'ed the STOP_FLUSH. See next issue (state transfer) for a description of what happens if we don't do this.

#2 State transfer

A and B are members of the group
B calls Channel.getState()
A and B receive a START_FLUSH, start the block in FLUSH
State is transferred from A to B
B multicasts a STOP_FLUSH and immediately afterwards sends a unicast message (which can 'pass' multicast messages, as they're unrelated)
A happens to receive the unicast message before the STOP_FLUSH. The unicast blocks and the STOP_FLUSH, which would unblock it, cannot be delivered

SOLUTION:

1. Same as solution 2 above. If we make the STOP_FLUSH phase synchronous, connect() or getState() only return once everyone has been unblocked

LONG TERM SOLUTION:

The much better solution of course is to make the STOP_FLUSH message out-of-band, so it can be delivered in parallel to other messages, and is not blocked (e.g.) by the unicast in the queue. So even if the unicast message was blocked waiting for STOP_FLUSH, once STOP_FLUSH has been received, it will be delivered, causing the unicast to unblock
Once we have this solution in place (2.5, threadless stack and out-of-band messages), we can revert the STOP_FLUSH to only use 1 phase rather than 2

relates to

JGRP-337 Make STOP_FLUSH phase in FLUSH asynchronous

Resolved

Assignee:: Vladimir Blagojevic (Inactive)

Reporter:: Bela Ban

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Created:: 2006/10/03 3:28 AM

Updated:: 2006/10/13 7:29 PM

Resolved:: 2006/10/13 7:29 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates