Loading...

XML

Word

Printable

Details

Type: Feature Request
Resolution: Done
Priority: Major
Fix Version/s: 3.2
Affects Version/s: 3.1
Labels:
None

Hierarchy Progress:
0
Hierarchy Progress Bar:

0% 0%

SFDC Cases Links:
SFDC Cases Counter:

Description

I first raised this, or something very like it, in ~~JGRP-1468~~; but it got lost among the other fixes that were made in that issue.

I've just seen a case where after a merge-view the following happens at a node W:

          SEQUENCER            GMS            APPLICATION
             |                  |                  |
 B (view)    |                  |                  |
------------>|                  |                  |
             |----------------->|                  |
             |                  |----\             |
 C (view)    |                  |     \            |
------------>|                  |      \           |
             |----------------->|       \          |
             |                  |        \         |
 C (msg)     |                  |        |         |
------------>|                  |        |         |
             |---------------------------|-------->|
             |                  |        |         |
             |                  |        \         |
             |                  |         \------->|

Initially the view is {B,W}
There's a merge view, in which B and C were the old coordinators
B and C both broadcast the new view {B,C,T,W}
. In fact both know the physical address of the member shown above (or we're using UDP multicast, if you like); so the view is sent to this node twice.
B's view arrives on thread Incoming-1, and gets as far as GMS
C's view arrives on thread Incoming-2, and gets as far as GMS (where it is dropped, since it is the same view)
A message from C, prompted by the change in view, arrives on thread Incoming-2
This overtakes the view on thread Incoming-1, and is delivered to the application
Only then does thread Incoming-1 get to deliver the view to the application

ie from the application's point of view: C broadcast the view and then the message; whereas at the node shown above the application received the message and then the view.

I think that the fix will simply be to put a lock in SEQUENCER around up_prot.up(evt) in SEQUENCER.deliver(). That way messages will be delivered to the application in the same order as they arrive at SEQUENCER.

Edit: updated description for clarity

Attachments

Activity

People

Assignee:: Bela Ban

Reporter:: David Hotham (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2012/07/23 2:58 PM

Updated:: 2012/08/31 7:39 AM

Resolved:: 2012/08/31 7:39 AM