Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2875

NAKACK4 / UNICAST4: multiple hung members might block view changes on full send-window

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 5.4.7
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      When we have multiple unresponsive (e.g. hanging) members that are unable to send back ACKs, then a coordinator might block on sending view changes. This is an edge case, almost never occurs, but requires a fix anyway.

      Consider view {A,B,C,D}. A is the coordinator

      • Members C and D are unresponsive (e.g. out of memory, kernel panic, severed from power etc)
      • A has a full send-window; it hasn't received ACKs from C and D for a while and was therefore not able to reap the send-window (ReliableMulticast)
      • A gets a SUSPECT(D) event
      • A creates new view V1={A,B,C} and sends it
      • A blocks in ReliableMulticast on sending V1: while member D was removed from the expected ACKs, C still doesn't send ACKs (purging and this unblocking A's send-window)
      • A gets a SUSPECT(C) event: this would create view V2; however, the sole processor thread in GMS is blocked on sending V1, therefore V2 will not be created until the processing of V1 has completed. This is not happening because V2, which would unblock V1 by removing C from A's send-window is never sent

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: