-
Bug
-
Resolution: Done
-
Major
-
3.0.9
-
None
Testcase essentially the same as in JGRP-1443 and JGRP-1449: ie a group of 4 members, where I simultaneously kill two at random and let them restart; and expect that the group should heal itself. In order to rule out SEQUENCER-related issues, I've removed that from the stack.
I've got into a situation where:
- members A, B, C see the same sequence of views and end up in a group [A, B, C]
- but member D believes that the latest view is [C, D, A].
I think I've identified the problem. First, here's the relevant trace (from D):
2012-04-15 10:47:37.910 [ViewHandler,TestCluster,10.239.0.4] DEBUG org.jgroups.protocols.pbcast.GMS - suspected members=[10.239.0.3], suspected_mbrs=[10.239.0.3]
2012-04-15 10:47:37.961 [ViewHandler,TestCluster,10.239.0.4] DEBUG org.jgroups.protocols.pbcast.GMS - suspected members=[10.239.0.2], suspected_mbrs=[10.239.0.3, 10.239.0.2]
2012-04-15 10:47:37.961 [ViewHandler,TestCluster,10.239.0.4] DEBUG org.jgroups.protocols.pbcast.GMS - members are [10.239.0.3, 10.239.0.2, 10.239.0.4, 10.239.0.1], coord=10.239.0.4: I'm the new coord !
2012-04-15 10:47:38.011 [ViewHandler,TestCluster,10.239.0.4] TRACE org.jgroups.protocols.pbcast.GMS - 10.239.0.4: new members=[], suspected=[10.239.0.2], leaving=[], new view: [10.239.0.3|629] [10.239.0.3, 10.239.0.4, 10.239.0.1]
2012-04-15 10:47:38.012 [ViewHandler,TestCluster,10.239.0.4] TRACE org.jgroups.protocols.pbcast.GMS - 10.239.0.4: mcasting view [10.239.0.3|629] [10.239.0.3, 10.239.0.4, 10.239.0.1] (3 mbrs)
It looks to me as though what has happened is D has received separate reports that B and C are suspected, and correctly spotted that in that case he'll be coordinator in a new group [D, A]. But then when he actually becomes coordinator, he only remembers that B is suspected, so sends out a bogus view.
If this is correct, I think that the bug is in ParticipantGmsImpl.java at the end of handleMembershipChange. I think that the final loop should be made for suspected_mbrs (before clearing ths value) and not for suspectedMembers.
Perhaps this is a bit speculative - you'll be able to tell me if I'm on the wrong track!
I'll keep the full trace so that we can do further analysis if required; and I'll try out a fix along the lines outlined above.