Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 3.1
Affects Version/s: 3.0.10
Labels:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The good news is that my testing is currently avoiding ~~JGRP-1451~~ type issues. (I'm running with the latest master, plus my pull request 54).

The bad news is, that seems to have unblocked me to find the next problem...

I'm running the usual stress test where I kill and restart members, and verify that the group heals itself. I've managed to get into a situation where:

A, B, and C all have no view at all (they're all repeatedly sending JOINs that time out)
D has got stuck with a view {B,C,D,A,C}
(in which every member except D is in fact a dead instance).

So what's happening on each of A, B and C is:

perform discovery
decide based on information from D that the long-dead B is coordinator
send a JOIN to that dead B
this times out
repeat

Meanwhile D's FD is repeatedly broadcasting that A is suspect, but no-one pays any attention.

In an ideal world, I'd think that it ought to be up to D to spot that something has gone wrong. Eg after a long enough period of reporting that A is suspect without seeing any change of view, it could deduce that there's a problem and become a singleton; or something like that. Then a merge should sort everything out in due course.

I'm actually experimenting with a workaround in which we only allow JOIN attempts to time out some maximum number of times; and if they time out too often the member becomes a singleton. ie I'm making a fix that allows A, B and C to proceed. Then I again expect a merge to sort everything out. This looks a lot easier to code up, and seems a plausible thing to want to do anyway.

I have the test running and will see how this goes overnight. If it looks to work I'll submit a pull request; else I'll think again.

Assignee:: Bela Ban

Reporter:: David Hotham (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2012/06/25 2:54 PM

Updated:: 2012/07/03 4:02 AM

Resolved:: 2012/07/03 2:32 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates