Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1846

RELAY2: delay shutting down bridge

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.5
    • 3.5
    • None

    Description

      A simple test that starts 2 sites x 2 nodes each and shuts them down in order shows a 1 second delay when shutting down the last node in the first site (B):

          public void testCoordinatorShutdown() throws Exception {
             a=createNode(LON, "A", LON_CLUSTER, null);
             b=createNode(LON, "B", LON_CLUSTER, null);
             x=createNode(SFO, "X", SFO_CLUSTER, null);
             y=createNode(SFO, "Y", SFO_CLUSTER, null);
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, a, b);
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, x, y);
             waitForBridgeView(2, 20000, 100, a, x);
      
             a.close();
             Util.waitUntilAllChannelsHaveSameSize(10000, 100, b);
      
             b.close();
             waitForBridgeView(1, 20000, 100, x);
      
             x.close();
      
             y.close();
          }
      

      And the relevant logs:

      13:51:30,017 DEBUG (Timer-2,sfo-cluster,X:) [GMS] _X:sfo: installing view [_A:lon|1] (2) [_A:lon, _X:sfo]
      13:51:30,028 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|2] (1) [_X:sfo]
      13:51:30,046 TRACE (Timer-2,lon-cluster,B:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global]
      13:51:31,046 TRACE (Timer-2,global,_B:lon:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global]
      13:51:31,099 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|3] (2) [_X:sfo, _B:lon]
      

      Note that while this happens on a background timer thread, the shutdown is delayed nonetheless because TP.destroy() waits at least 500ms for all the timer threads to finish (TimeScheduler3.stopRunning(). Perhaps that should change as well, so that timer threads are interrupted and finish immediately.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: