Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1460

Deadlock in PEER_LOCK

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 3.0.11, 3.1
    • 3.0.9
    • Hide

      easy reproduce steps by using debug to control executing:
      two nodes, nodeA and nodeB, using PEER_LOCK, nodeA is the coordinate.
      1. add debug break point at org.jgroups.protocols.Locking line 565 (the line content is: "synchronized(client_locks) {")
      2. lock and then unlock in nodeA, when unlock, the debug break point will let thread break at Locking line 565
      3. disconnect/kill nodeB

      then dead lock happens.

      Show
      easy reproduce steps by using debug to control executing: two nodes, nodeA and nodeB, using PEER_LOCK, nodeA is the coordinate. 1. add debug break point at org.jgroups.protocols.Locking line 565 (the line content is: "synchronized(client_locks) {") 2. lock and then unlock in nodeA, when unlock, the debug break point will let thread break at Locking line 565 3. disconnect/kill nodeB then dead lock happens.

    Description

      I tried 2.12.1.final and 3.1.0.Alpha3, both versions can reproduce.

      two nodes, one node do lock/unlock, kill another node. then dead lock will happen. "Steps to Reproduce" has a way can always reproduce this issue.

      (BTW, I have another issue relative with TCP, I meet "[org.jgroups.protocols.TCP] discarded message from different cluster" warning, not sure why get traffic between two groups. I thought TCP would not send msg accors groups, could you please help explain this? my email is: freeliuade@yahoo.com.cn. or please let me know a fourm I can ask this question)

      following is the thread dump:

      Found one Java-level deadlock:
      =============================
      "ViewHandler,myClusterGroup,pek-wkst2cwxc-13306":
      waiting to lock monitor 0x16be6f7c (object 0x02b7e218, a org.jgroups.protocols.PEER_LOCK$PeerLock),
      which is held by "main"
      "main":
      waiting to lock monitor 0x16be6f14 (object 0x02f14118, a java.util.HashMap),
      which is held by "ViewHandler,myClusterGroup,pek-wkst2cwxc-13306"

      Java stack information for the threads listed above:
      ===================================================
      "ViewHandler,myClusterGroup,pek-wkst2cwxc-13306":
      at org.jgroups.protocols.PEER_LOCK$PeerLock.retainAll(PEER_LOCK.java:96)

      • waiting to lock <0x02b7e218> (a org.jgroups.protocols.PEER_LOCK$PeerLock)
        at org.jgroups.protocols.PEER_LOCK.handleView(PEER_LOCK.java:76)
      • locked <0x02f14118> (a java.util.HashMap)
        at org.jgroups.protocols.Locking.up(Locking.java:271)
        at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:178)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:181)
        at org.jgroups.protocols.pbcast.GMS.installView(GMS.java:630)
        at org.jgroups.protocols.pbcast.CoordGmsImpl.handleViewChange(CoordGmsImpl.java:247)
        at org.jgroups.protocols.pbcast.GMS.castViewChange(GMS.java:494)
        at org.jgroups.protocols.pbcast.CoordGmsImpl.handleMembershipChange(CoordGmsImpl.java:223)
        at org.jgroups.protocols.pbcast.GMS$ViewHandler.process(GMS.java:1387)
        at org.jgroups.protocols.pbcast.GMS$ViewHandler.run(GMS.java:1341)
        at java.lang.Thread.run(Thread.java:619)

      "main":
      at org.jgroups.protocols.Locking.removeClientLock(Locking.java:566)

      • waiting to lock <0x02f14118> (a java.util.HashMap)
        at org.jgroups.protocols.Locking$ClientLock._unlock(Locking.java:976)
      • locked <0x02b7e218> (a org.jgroups.protocols.PEER_LOCK$PeerLock)
        at org.jgroups.protocols.Locking$ClientLock.unlock(Locking.java:915)
      • locked <0x02b7e218> (a org.jgroups.protocols.PEER_LOCK$PeerLock)
        at org.jgroups.protocols.Locking.down(Locking.java:162)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1022)
        at org.jgroups.JChannel.down(JChannel.java:729)
        at org.jgroups.blocks.locking.LockService$LockImpl.unlock(LockService.java:120)
        at JGroupJDBCPingTest.main(JGroupJDBCPingTest.java:101)

      Found 1 deadlock.

      Attachments

        Activity

          People

            rhn-engineering-bban Bela Ban
            freeliuade freeliuade freeliuade (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: