Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2028

GMS sometimes ignores view bundling timeout

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Minor Minor
    • 3.6.9, 4.0
    • 3.6.8
    • None

      GMS.ViewHandler.run() has this code:

          wait_time=timeout - (System.nanoTime() - start_time);
          if(wait_time > 0 && firstRequest.canBeProcessedTogether(firstRequest)) { // JGRP-1438
              long wait_time_ms=TimeUnit.MILLISECONDS.convert(wait_time, TimeUnit.NANOSECONDS);
              queue.waitUntilClosed(wait_time_ms); // misnomer: waits until element has been added or q closed
          }
      

      The problem is wait_time_ms will be 0 if 0 < wait_time < 1_000_000. queue.waitUntilClosed(wait_time_ms) then calls Object.wait(0) and blocks forever.

      Fortunately, the joiners re-sends the JOIN_REQ message after GMS.join_timeout (3s by default), so all that happens is that the view is delayed by 3s. It does cause some random failures in tests that expect the view to form in a precise amount of time, though.

      17:04:53,034 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
      17:04:53,034 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
      17:04:53,035 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
      17:04:53,035 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
      17:04:53,060 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
      17:04:53,060 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=1, first, TP: [cluster_name=ISPN]
      17:04:53,060 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeF-49399: #1, conn_id=0, first)
      17:04:53,060 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644 <-- DATA(NodeG-49307: #1, conn_id=0, first)
      17:04:53,061 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#1
      17:04:53,061 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#1
      17:04:56,035 WARN  (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: JOIN(NodeF-49399) sent to NodeE-37644 timed out (after 3000 ms), on try 1
      17:04:56,035 WARN  (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: JOIN(NodeG-49307) sent to NodeE-37644 timed out (after 3000 ms), on try 1
      17:04:56,036 DEBUG (ForkThread-3,InitialClusterSizeTest:) [GMS] NodeF-49399: sending JOIN(NodeF-49399) to NodeE-37644
      17:04:56,036 TRACE (ForkThread-3,InitialClusterSizeTest:) [TCP_NIO2] NodeF-49399: sending msg to NodeE-37644, src=NodeF-49399, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
      17:04:56,036 DEBUG (ForkThread-4,InitialClusterSizeTest:) [GMS] NodeG-49307: sending JOIN(NodeG-49307) to NodeE-37644
      17:04:56,036 TRACE (ForkThread-4,InitialClusterSizeTest:) [TCP_NIO2] NodeG-49307: sending msg to NodeE-37644, src=NodeG-49307, headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
      17:04:56,057 TRACE (INT-1,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeF-49399 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeF-49399, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
      17:04:56,057 TRACE (INT-2,NodeE-37644:) [TCP_NIO2] NodeE-37644: received [dst: NodeE-37644, src: NodeG-49307 (3 headers), size=0 bytes, flags=OOB|INTERNAL], headers are GMS: GmsHeader[JOIN_REQ]: mbr=NodeG-49307, UNICAST3: DATA, seqno=2, TP: [cluster_name=ISPN]
      17:04:56,057 TRACE (INT-2,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeG-49307#2
      17:04:56,057 TRACE (INT-1,NodeE-37644:) [UNICAST3] NodeE-37644: delivering NodeF-49399#2
      # finally
      17:04:56,057 TRACE (ViewHandler,NodeE-37644:) [GMS] NodeE-37644: joiners=[NodeG-49307, NodeF-49399], suspected=[], leaving=[], new view: [NodeE-37644|1] (3) [NodeE-37644, NodeG-49307, NodeF-49399]
      

              rhn-engineering-bban Bela Ban
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: