Running a test in jgroups-raft that disconnects two channels in a row, the final is not updated to the remaining members. The test fails in line https://github.com/jgroups-extras/jgroups-raft/blob/d8aedcb3753b404e621983f195023c7c1cae4870/tests/junit-functional/org/jgroups/tests/VoteTest.java#L162, waiting for the view to update on all members.
The log file shows:
-- shutdown channels 21:41:04,159 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): D: change leader from null -> null 21:41:04,159 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): D: sending LEAVE request to A 21:41:04,159 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: handleMembershipChange([LEAVE(D)]) 21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: joiners=[], suspected=[], leaving=[D], new view: [A|4] (3) [A, B, C] 21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: sending LEAVE response to D 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: installing view [A|4] (3) [A, B, C] (D left) 21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): D: got LEAVE response from A in 1 ms 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,A] (o.j.p.r.ELECTION): A: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change 21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): D: change leader from null -> null 21:41:04,160 [TRACE] [jgroups-4,VoteTest,A] (o.j.p.p.GMS): A: mcasting view [A|4], ref-view=[A|3], left=[D] 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,B] (o.j.p.p.GMS): B: installing view [A|4] (3) [A, B, C] (D left) 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,C] (o.j.p.p.GMS): C: installing view [A|4] (3) [A, B, C] (D left) 21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): C: change leader from null -> null 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,B] (o.j.p.r.ELECTION): B: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change 21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.p.GMS): C: last member in the group (coord); leaving now 21:41:04,160 [DEBUG] [jgroups-4,VoteTest,C] (o.j.p.r.ELECTION): C: existing view: [A|3] (4) [A, B, C, D], new view: [A|4] (3) [A, B, C], result: no_change 21:41:04,160 [TRACE] [TestNG-test-Surefire test-2] (o.j.p.r.RAFT): C: change leader from null -> null
We see that node C tries to leave and is concurrently updating the view. Few operations need synchronization:
- https://github.com/belaban/JGroups/blob/0f138c9124f633a2e6215c2825db2cda51eac5ad/src/org/jgroups/protocols/pbcast/GMS.java#L714-L716
- And, internally the `Membership#set` operation needs to be atomic: https://github.com/belaban/JGroups/blob/0f138c9124f633a2e6215c2825db2cda51eac5ad/src/org/jgroups/Membership.java#L159-L162
This list is not exhaustive. I haven't read all the uses, so more places might need synchronization.