-
Bug
-
Resolution: Done
-
Major
-
3.0.10, 3.1
-
None
Per JGRP-1426 etc, I want views and messages to be sequenced. That is, if node A sees view V1 before message M1, then no node shall see M1 before V1; and vice versa. To that end, I have SEQUENCER below GMS in my stack.
As a slight aside, perhaps it would be helpful if I said a little more about why I desire this property. Skip this bit if you like.
I'm holding an application-level leadership election, in which I need to take into account various properties not available to JGroups. I do this as follows:
- Whenever a node sees a change of view it broadcasts status information.
- When the coordinator has received status messages from everyone in the view it chooses the new leader.
In particular, then, it's bad if a status message triggered by a view change reaches the coordinator before the coordinator has seen the change of view. If this happens, then when the coordinator does see the change of view it waits for a status message that never comes (because it has already been received).
If JGroups is unable to provide the property that I'm looking for, then application-level workarounds do suggest themselves. Eg nodes could retransmit status messages, or the coordinator could run a timer and then choose a leader anyway, and so on. But I had been hoping that this would not be necessary.
I've just seen a case where:
- we have two groups: [A], and [D, B, C]
- we perform a merge, to get the new view [A, D, B, C]
- A sends INSTALL_MERGE_VIEW to both A and D
- Now there's a race. D wins, and broadcasts the new view to [D, B, C]
- Application at D sees the view change, and this causes it to broadcast a message in the new view [A, D, B, C]
- This arrives at A and overtakes the installation of the view there
- So the application at A sees D's message before it sees the new view, whereas at D the opposite was true.
Here's some trace from A showing it installing the view on thread Incoming-1, and having this be overtaken by the message on thread Incoming-2:
2012-05-23 00:12:20.381 [Incoming-1,Clumpy Test Cluster,CFS-A-tinkywinky] TRACE org.jgroups.protocols.pbcast.GMS - CFS-A-tinkywinky: mcasting view MergeView::[CFS-A-tinkywinky|3] [CFS-A-tinkywinky, CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], subgroups=[CFS-B-chucklebrothers|2] [CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], [CFS-A-tinkywinky|0] [CFS-A-tinkywinky] (4 mbrs) 2012-05-23 00:12:20.386 [Incoming-1,Clumpy Test Cluster,CFS-A-tinkywinky] DEBUG org.jgroups.protocols.pbcast.GMS - CFS-A-tinkywinky: installing view MergeView::[CFS-A-tinkywinky|3] [CFS-A-tinkywinky, CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], subgroups=[CFS-B-chucklebrothers|2] [CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], [CFS-A-tinkywinky|0] [CFS-A-tinkywinky] 2012-05-23 00:12:20.395 [Incoming-2,Clumpy Test Cluster,CFS-A-tinkywinky] TRACE org.jgroups.protocols.TCP - received [dst: CFS-A-tinkywinky, src: CFS-B-chucklebrothers (3 headers), size=87 bytes], headers are SEQUENCER: FORWARD (tag=[CFS-B-chucklebrothers|15]), UNICAST2: DATA, seqno=2, conn_id=3, TCP: [channel_name=Clumpy Test Cluster] 2012-05-23 00:12:20.396 [Incoming-2,Clumpy Test Cluster,CFS-A-tinkywinky] TRACE org.jgroups.protocols.SEQUENCER - CFS-A-tinkywinky: broadcasting CFS-B-chucklebrothers::15 2012-05-23 00:12:20.396 [Incoming-2,Clumpy Test Cluster,CFS-A-tinkywinky] TRACE org.jgroups.protocols.SEQUENCER - CFS-A-tinkywinky: delivering CFS-B-chucklebrothers::15 2012-05-23 00:12:20.397 [Incoming-2,Clumpy Test Cluster,CFS-A-tinkywinky] INFO c.m.c.CommunicatorComponent$Communicator - CFS-B-chucklebrothers has sent us: ClusterMgmtMsg(STATUS,false,Some(10.239.0.4),Some(CFS-B-chucklebrothers),Some(ChangeId(0,0)),Some(134283264),Some(0),None,None) 2012-05-23 00:12:20.421 [Incoming-1,Clumpy Test Cluster,CFS-A-tinkywinky] INFO c.m.c.CommunicatorComponent$Communicator - New view: MergeView::[CFS-A-tinkywinky|3] [CFS-A-tinkywinky, CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], subgroups=[CFS-B-chucklebrothers|2] [CFS-B-chucklebrothers, CFS-B-tinkywinky, CFS-A-chucklebrothers], [CFS-A-tinkywinky|0] [CFS-A-tinkywinky]
(The last two lines are from my application. I also have trace from D showing it sending its message after installing the view, which I can provide if required).
My thoughts:
- I think that this particular case could be fixed by removing the special case that goes "If we're the only member the VIEW is broadcast to, let's simply install the view directly" in GMS.java line 495? I'm thinking that if the view was sent as a message then both it and the message from D, which has gone via SEQUENCER, will be messages from A and therefore the overtaking seen above must not happen.
- However, I've been unable to convince myself that this will fix the more general case where a message from subgroup 1 arrives in subgroup 2 before subgroup 2 has installed the view.
- Indeed, I'm struggling to think of a way to make this work without changing the way that merge-views are installed.
- It feels to me as though the cleanest way to achieve what I'm looking for would be to have the new coordinator broadcast the new view to everyone, rather than having each of the old coordinators deal with its own subgroup. Then there are no races between subgroups.
- But I expect there are reasons for it to work the way that it does?
- And I expect that this would be quite a disruptive change to make.
What do you think?
Thanks, as ever, for your help...