Details
-
Feature Request
-
Resolution: Done
-
Major
-
3.1
-
None
-
0
-
0%
Description
I first raised this, or something very like it, in JGRP-1468; but it got lost among the other fixes that were made in that issue.
I've just seen a case where after a merge-view the following happens at a node W:
SEQUENCER GMS APPLICATION | | | B (view) | | | ------------>| | | |----------------->| | | |----\ | C (view) | | \ | ------------>| | \ | |----------------->| \ | | | \ | C (msg) | | | | ------------>| | | | |---------------------------|-------->| | | | | | | \ | | | \------->|
- Initially the view is {B,W}
- There's a merge view, in which B and C were the old coordinators
- B and C both broadcast the new view
{B,C,T,W}
. In fact both know the physical address of the member shown above (or we're using UDP multicast, if you like); so the view is sent to this node twice.
- B's view arrives on thread Incoming-1, and gets as far as GMS
- C's view arrives on thread Incoming-2, and gets as far as GMS (where it is dropped, since it is the same view)
- A message from C, prompted by the change in view, arrives on thread Incoming-2
- This overtakes the view on thread Incoming-1, and is delivered to the application
- Only then does thread Incoming-1 get to deliver the view to the application
ie from the application's point of view: C broadcast the view and then the message; whereas at the node shown above the application received the message and then the view.
I think that the fix will simply be to put a lock in SEQUENCER around up_prot.up(evt) in SEQUENCER.deliver(). That way messages will be delivered to the application in the same order as they arrive at SEQUENCER.
Edit: updated description for clarity