-
Bug
-
Resolution: Done
-
Major
-
3.5
-
None
A simple test that starts 2 sites x 2 nodes each and shuts them down in order shows a 1 second delay when shutting down the last node in the first site (B):
public void testCoordinatorShutdown() throws Exception { a=createNode(LON, "A", LON_CLUSTER, null); b=createNode(LON, "B", LON_CLUSTER, null); x=createNode(SFO, "X", SFO_CLUSTER, null); y=createNode(SFO, "Y", SFO_CLUSTER, null); Util.waitUntilAllChannelsHaveSameSize(10000, 100, a, b); Util.waitUntilAllChannelsHaveSameSize(10000, 100, x, y); waitForBridgeView(2, 20000, 100, a, x); a.close(); Util.waitUntilAllChannelsHaveSameSize(10000, 100, b); b.close(); waitForBridgeView(1, 20000, 100, x); x.close(); y.close(); }
And the relevant logs:
13:51:30,017 DEBUG (Timer-2,sfo-cluster,X:) [GMS] _X:sfo: installing view [_A:lon|1] (2) [_A:lon, _X:sfo] 13:51:30,028 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|2] (1) [_X:sfo] 13:51:30,046 TRACE (Timer-2,lon-cluster,B:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global] 13:51:31,046 TRACE (Timer-2,global,_B:lon:) [SHARED_LOOPBACK] _B:lon: sending msg to _X:sfo, src=_B:lon, headers are GMS: GmsHeader[JOIN_REQ]: mbr=_B:lon, UNICAST3: DATA, seqno=1, first, SHARED_LOOPBACK: [cluster_name=global] 13:51:31,099 DEBUG (Incoming-2,global,_X:sfo:) [GMS] _X:sfo: installing view [_X:sfo|3] (2) [_X:sfo, _B:lon]
Note that while this happens on a background timer thread, the shutdown is delayed nonetheless because TP.destroy() waits at least 500ms for all the timer threads to finish (TimeScheduler3.stopRunning(). Perhaps that should change as well, so that timer threads are interrupted and finish immediately.