Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1486

Merge failure when dead instances remain in view

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.1
    • 3.0.10
    • None

    Description

      I've hit this testing my JGRP-1485 fix, but I think it's a logically independent issue.

      So, I've reached a point where:

      • A, B and C all have view {C,A,B}
      • D has view {B', D', D, A', C}

        , in which B', D' and A' are all dead instances

      As in JGRP-1485, an optimal fix would surely be to allow D to recover all by itself, but it's not clear to me how to do that. However, my expectation was that a merge should sort things out; and I think that if it did then that ought to be good enough.

      But what's actually happening is this:

      • C becomes merge leader
      • determines that merge participants are C, D', D, A'
      • sends MERGE_REQ to those members
      • the MERGE_REQ to D' reaches D (and that to A' reaches A)
      • D sends a positive response for the MERGE_REQ that was meant for it, but after 2.5 seconds also sends a negative response to the MERGE_REQ meant for D'. (I think that the negative response is because it can't fetch the digest from D')
      • likewise A sends a negative response to the MERGE_REQ meant for A'

      So what C sees is:

      • good responses from C and D, followed by merge_rejected responses from A and D
      • so it removes A' and D' from the merge (it didn't get responses from them)
      • then it removes D from the merge (because the most recent response from D said merge_rejected)
      • so it is left only with itself, and comes up with a consolidated view that is identical to its original view

      in short: the merge doesn't do anything useful after all.

      I think that the key here is the confusion between D and D'. Possibly the fix is as simple as: ignore MERGE_REQs where the destination address on the message is not the local address.

      I'll try this out and, if it looks good, submit a pull request.

      Attachments

        Issue Links

          Activity

            People

              rhn-engineering-bban Bela Ban
              dimbleby David Hotham (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: