Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-787

UNICAST over TCP with xmit_off=true: sending message in synchronized block leads to deadlocks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.6.3, 2.7
    • None
    • None

      Same issue as http://jira.jboss.com/jira/browse/JGRP-303: that's why we moved the send() outside the synchronized block.
      The problem with xmit_off though is that we need to know the message was passed to TCP/IP successfully, or else we CANNOT increment the sequence number !

      Stack trace:

      Found one Java-level deadlock:
      =============================
      "Incoming-27,UnicastTest-Group,192.168.1.5:7500":
      waiting for ownable synchronizer 0x00002aaac0921168, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
      which is held by "Incoming-4,UnicastTest-Group,192.168.1.5:7500"
      "Incoming-4,UnicastTest-Group,192.168.1.5:7500":
      waiting to lock monitor 0x00002aaacc8e9cf0 (object 0x00002aaac09e3a88, a org.jgroups.protocols.UNICAST$Entry),
      which is held by "main"
      "main":
      waiting for ownable synchronizer 0x00002aaac0921168, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
      which is held by "Incoming-4,UnicastTest-Group,192.168.1.5:7500"

      Java stack information for the threads listed above:
      ===================================================
      "Incoming-27,UnicastTest-Group,192.168.1.5:7500":
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00002aaac0921168> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:635)
        at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
        at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
        at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
        at org.jgroups.protocols.Discovery.up(Discovery.java:244)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
        at org.jgroups.protocols.TP.access$100(TP.java:49)
        at org.jgroups.protocols.TP$1.run(TP.java:1169)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)
        "Incoming-4,UnicastTest-Group,192.168.1.5:7500":
        at org.jgroups.protocols.UNICAST.down(UNICAST.java:357)
      • waiting to lock <0x00002aaac09e3a88> (a org.jgroups.protocols.UNICAST$Entry)
        at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
        at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:204)
        at org.jgroups.protocols.pbcast.GMS.down(GMS.java:859)
        at org.jgroups.protocols.FC.sendCredit(FC.java:740)
        at org.jgroups.protocols.FC.up(FC.java:416)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:788)
        at org.jgroups.protocols.VIEW_SYNC.up(VIEW_SYNC.java:192)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:233)
        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:645)
        at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
        at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
        at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
        at org.jgroups.protocols.Discovery.up(Discovery.java:244)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
        at org.jgroups.protocols.TP.access$100(TP.java:49)
        at org.jgroups.protocols.TP$1.run(TP.java:1169)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
        at java.lang.Thread.run(Thread.java:619)
        "main":
        at sun.misc.Unsafe.park(Native Method)
      • parking to wait for <0x00002aaac0921168> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:747)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:778)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1114)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262)
        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:635)
        at org.jgroups.protocols.UNICAST.up(UNICAST.java:292)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:735)
        at org.jgroups.protocols.BARRIER.up(BARRIER.java:136)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:167)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:309)
        at org.jgroups.protocols.MERGE2.up(MERGE2.java:144)
        at org.jgroups.protocols.Discovery.up(Discovery.java:244)
        at org.jgroups.protocols.TP.passMessageUp(TP.java:1266)
        at org.jgroups.protocols.TP.access$100(TP.java:49)
        at org.jgroups.protocols.TP$1.run(TP.java:1169)
        at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1737)
        at org.jgroups.util.ShutdownRejectedExecutionHandler.rejectedExecution(ShutdownRejectedExecutionHandler.java:39)
        at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
        at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
        at org.jgroups.protocols.TP.down(TP.java:1167)
        at org.jgroups.protocols.Discovery.down(Discovery.java:349)
        at org.jgroups.protocols.MERGE2.down(MERGE2.java:175)
        at org.jgroups.protocols.FD_SOCK.down(FD_SOCK.java:373)
        at org.jgroups.protocols.VERIFY_SUSPECT.down(VERIFY_SUSPECT.java:95)
        at org.jgroups.protocols.BARRIER.down(BARRIER.java:107)
        at org.jgroups.protocols.pbcast.NAKACK.down(NAKACK.java:660)
        at org.jgroups.protocols.UNICAST.send(UNICAST.java:484)
        at org.jgroups.protocols.UNICAST.down(UNICAST.java:373)
      • locked <0x00002aaac09e3a88> (a org.jgroups.protocols.UNICAST$Entry)
        at org.jgroups.protocols.pbcast.STABLE.down(STABLE.java:316)
        at org.jgroups.protocols.VIEW_SYNC.down(VIEW_SYNC.java:204)
        at org.jgroups.protocols.pbcast.GMS.down(GMS.java:859)
        at org.jgroups.protocols.FC.handleDownMessage(FC.java:526)
        at org.jgroups.protocols.FC.down(FC.java:365)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:175)
        at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER.down(STREAMING_STATE_TRANSFER.java:303)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:457)
        at org.jgroups.JChannel.down(JChannel.java:1443)
        at org.jgroups.JChannel.send(JChannel.java:620)
        at org.jgroups.tests.UnicastTest.sendMessages(UnicastTest.java:241)
        at org.jgroups.tests.UnicastTest.eventLoop(UnicastTest.java:198)
        at org.jgroups.tests.UnicastTest.main(UnicastTest.java:355)

      Found 1 deadlock.

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: