Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1452

SEQUENCER goes wrong when members fail simultaneously

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 3.1
    • 3.0.9
    • None

      Consider the case where current view is [A, B, C, D], and A and B both die more or less simultaneously.

      C will now try to broadcast the new view [C, D]. But if SEQUENCER is in the stack this goes wrong: SEQUENCER on C doesn't yet know that it is coordinator and tries to forward to either A or B. The change of view gets stuck.

      The problem looks to be in handleSuspect(). This assumes that there is at most one suspect, removes that from the list of members, and figures that whoever is left will be the new coordinator. But this fails in the case just described.

      IMHO it's a mistake for SEQUENCER to try and duplicate the work that the GMS layer does in the new view. I'm currently trying a fix that removes handleSuspect() from SEQUENCER altogether, and instead pays attention to TMP_VIEW events. This seems to be working, I think.

      David

              rhn-engineering-bban Bela Ban
              dimbleby David Hotham (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: