Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1003

deadlock in TP.send - socket write hangs inside a Lock, preventing other threads to do TP.send()

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • 2.10
    • 2.7
    • None
    • Linux (Debian), kernel 2.6.x

    • Workaround Exists
    • Hide

      I think it is possible to use TCP_NIO instead o TCP in protocol stack - i.e. non=blocking I/O, so socket will never block.

      Show
      I think it is possible to use TCP_NIO instead o TCP in protocol stack - i.e. non=blocking I/O, so socket will never block.

      I am using JGroups 2.7.0 GA with the typical protocols stack based on TCP (taken from tcp.xml).

      Sometimes it occurs that socket write operation hangs inside TP.send() – this is not so untypical for blocking I/O approach! – after that JGroups is not working at all because most of threads are blocked inside TP.send() waiting for that first thread releasing "out_stream_lock". Below is a fragment from my stack trace.

      1) the thread that hangs in socket write until forever (maybe due to a broken socket or something else); this thread owns "out_stream_lock in TP class":

      "Timer-1,name,IP:port" daemon prio=10 tid=0x084e5c00 nid=0x15f0 runnable [0x211c1000..0x211c2040]
      java.lang.Thread.State: RUNNABLE
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)

      • locked <0x43427300> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:106)
        at org.jgroups.blocks.TCPConnectionMap$TCPConnection.doSend(TCPConnectionMap.java:472)
        at org.jgroups.blocks.TCPConnectionMap$TCPConnection._send(TCPConnectionMap.java:453)
        at org.jgroups.blocks.TCPConnectionMap$TCPConnection.send(TCPConnectionMap.java:436)
        at org.jgroups.blocks.TCPConnectionMap$TCPConnection.access$100(TCPConnectionMap.java:341)
        at org.jgroups.blocks.TCPConnectionMap.send(TCPConnectionMap.java:137)
        at org.jgroups.protocols.TCP.send(TCP.java:53)
        at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:141)
        at org.jgroups.protocols.TP.doSend(TP.java:1105)
        at org.jgroups.protocols.TP.send(TP.java:1088)
        at org.jgroups.protocols.TP.down(TP.java:907)
        at org.jgroups.protocols.Discovery.down(Discovery.java:363)
        at org.jgroups.protocols.MERGE2.down(MERGE2.java:169)
        at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:333)
        at org.jgroups.protocols.FD.down(FD.java:327)
        at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:72)
        at org.jgroups.protocols.BARRIER.down(BARRIER.java:96)
        at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1530)
        at org.jgroups.protocols.pbcast.NAKACK.retransmit(NAKACK.java:1476)
        at org.jgroups.stack.Retransmitter$Task.run(Retransmitter.java:207)
        at org.jgroups.util.TimeScheduler$TaskWrapper.run(TimeScheduler.java:218)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:207)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

      Locked ownable synchronizers:

      • <0x2b5522e0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      • <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      • <0x434273f0> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

      2) and there are many threads like this blocked "on out_stream_lock.lock()" call inside TP class:

      "Connection.Receiver [ip:port - ip:port],name,ip:port" prio=10 tid=0x020dc000 nid=0x1fab waiting on condition [0x1365f000..0x1365ffc0]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x2b5550f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
        at org.jgroups.protocols.TP.send(TP.java:1082)
        at org.jgroups.protocols.TP.down(TP.java:907)
        at org.jgroups.protocols.Discovery.up(Discovery.java:274)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:995)
        at org.jgroups.protocols.TP.access$100(TP.java:52)
        at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1451)
        at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1427)
        at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1738)
        at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:34)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
        at org.jgroups.protocols.TP.dispatchToThreadPool(TP.java:1061)
        at org.jgroups.protocols.TP.receive(TP.java:1038)
        at org.jgroups.protocols.BasicTCP.receive(BasicTCP.java:180)
        at org.jgroups.blocks.TCPConnectionMap$TCPConnection$ConnectionPeerReceiver.run(TCPConnectionMap.java:553)
        at java.lang.Thread.run(Thread.java:619)

      Locked ownable synchronizers:

      • None

      (Since all threads from JGroups' ThreadPool are locked, you can see "rejectedExecution" in the last stack trace).

      Maybe we could implement some solution for TCP (blocking I/O)? Maybe it is possible to release the lock before socket write?

        1. stacktrace.txt
          51 kB
        2. jgroups_stacktrace_09feb2010.txt
          58 kB
        3. jgroups_s2.txt
          57 kB
        4. j1__.txt
          123 kB

              rhn-engineering-bban Bela Ban
              legacy_external_user Legacy User (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: