Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1977

More redundant initial join logic to avoid becoming a fake coordinator

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Won't Do
    • Icon: Major Major
    • 3.6.7
    • None
    • None

      If the very initial JGroups discovery packet is lost, it is never recovered by the current GMS join logic. The node will be a standalone coordinator then merges after several minutes.

      This can happen if a new node reside in another network segment and a switch between the segments requires some time to establish a new multicast route. Currently, there is no enough time between IGMP join (by MulticastSocket#joinGroup()) and the JGroups discovery packet and the later is lost in such a network environment. Because the number of nodes can be very large, configuring a static route in the switch is not reasonable.

      Specifically, in method org.jgroups.protocols.pbcast.ClientGmsImpl#joinInternal(), part of gms.getDownProtocol().down(Event.FIND_INITIAL_MBRS_EVT) is outside of the retry loop of GMS.max_join_attempts and GMS.join_timeout.

              rhn-engineering-bban Bela Ban
              rhn-support-onagano Osamu Nagano
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: