GMS.ViewHandler.run() has this code:
wait_time=timeout - (System.nanoTime() - start_time); if(wait_time > 0 && firstRequest.canBeProcessedTogether(firstRequest)) { // JGRP-1438 long wait_time_ms=TimeUnit.MILLISECONDS.convert(wait_time, TimeUnit.NANOSECONDS); queue.waitUntilClosed(wait_time_ms); // misnomer: waits until element has been added or q closed }
The problem is wait_time_ms will be 0 if 0 < wait_time < 1_000_000. queue.waitUntilClosed(wait_time_ms) then calls Object.wait(0) and blocks forever.
Fortunately, the joiners re-sends the JOIN_REQ message after GMS.join_timeout (3s by default), so all that happens is that the view is delayed by 3s. It does cause some random failures in tests that expect the view to form in a precise amount of time, though.
17:04:53,034 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644 17:04:53,034 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN] 17:04:53,035 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644 17:04:53,035 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN] 17:04:53,060 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN] 17:04:53,060 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN] 17:04:53,060 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeF-49399: #1, conn_id=0, first) 17:04:53,060 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeG-49307: #1, conn_id=0, first) 17:04:53,061 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#1 17:04:53,061 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#1 17:04:56,035 WARN (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: JOIN(NodeF-49399) sent to NodeE-37644 timed out (after 3000 ms), on try 1 17:04:56,035 WARN (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: JOIN(NodeG-49307) sent to NodeE-37644 timed out (after 3000 ms), on try 1 17:04:56,036 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644 17:04:56,036 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN] 17:04:56,036 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644 17:04:56,036 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN] 17:04:56,057 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN] 17:04:56,057 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN] 17:04:56,057 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#2 17:04:56,057 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#2 # finally 17:04:56,057 TRACE (ViewHandler,NodeE-37644:) [GMS] NodeE-37644: joiners=[NodeG-49307, NodeF-49399], suspected=[], leaving=[], new view: [NodeE-37644|1] (3) [NodeE-37644, NodeG-49307, NodeF-49399]
- blocks
-
JBEAP-4655 Not blocking forever if wait_time_ms == 0
- Closed