Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1493

Merge fails because failing to get physical address takes too long

    XMLWordPrintable

Details

    • Feature Request
    • Resolution: Done
    • Major
    • 3.2
    • 3.1
    • None
    • 0
    • 0% 0%

    Description

      Start with the following views:

      • A, B and C all have {A,B,C}
      • D has {B', D, A, C'}

        , where B' and C' are dead.

      A decides to lead a merge (he's the only 'actual' coordinator). By the time we've been through view-sanitization and so on and reached getMergeDataFromSubgroupCoordinators(), coords are

      {D, C', A}

      .

      Here A tries to send MERGE_REQ to those elements. However, A does not have a physical address for C', and in fact nor does anyone else. So when trying to send the MERGE_REQ to C', A will always spend a little over 5 seconds in TP.sendToSingleMember() - trying and failing to discover that physical address.

      Of course A won't get a response from C' either, so it will take another 5 seconds for merge_rsps.waitForAllResponses to time out.

      But that means that it's a sure thing that the MergeKiller will kick in first.

      Therefore the merge can never progress.

      (Presumably the situation would be even worse if D's view had contained further dead members).

      I expect to work around this by tweaking the timings somewhere: probably in startMergeKiller, so that the MergeKiller takes longer to be scheduled.

      I'd think that the right fix would be to arrange that the MergeTask is not blocked by TP having no physical address for a member.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            dimbleby David Hotham (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: