Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-465

Deadlock in FC if RPC response blocks

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.4.1 SP2
    • 2.4.1 SP1
    • None
    • Documentation (Ref Guide, User Guide, etc.), Compatibility/Configuration

      In 2.4.1.SP1 (and probably earlier) FC can deadlock if all up/down threads are set to false at and above FC and an incoming RPC loops back down the channel with its response.

      Following stack trace shows a deadlock situation (note the line numbers in FC are off from the cvs code; this occured with a patched FC version, but the patch is not relevant to this error):

      "IncomingPacketHandler (channel=Tomcat-Cluster)" daemon prio=1 tid=0xc8a68b60 nid=0x1ece in Object.wait() [0xc94d1000..0xc94d1f30]
      at java.lang.Object.wait(Native Method)
      at EDU.oswego.cs.dl.util.concurrent.CondVar.timedwait(CondVar.java:222)

      • locked <0xd187e398> (a EDU.oswego.cs.dl.util.concurrent.CondVar)
        at org.jgroups.protocols.FC.handleDownMessage(FC.java:394)
        at org.jgroups.protocols.FC.down(FC.java:336)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
        at org.jgroups.protocols.FC.receiveDownEvent(FC.java:330)
        at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
        at org.jgroups.protocols.FRAG2.down(FRAG2.java:167)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
        at org.jgroups.stack.Protocol.passDown(Protocol.java:551)
        at org.jgroups.protocols.pbcast.STATE_TRANSFER.down(STATE_TRANSFER.java:294)
        at org.jgroups.stack.Protocol.receiveDownEvent(Protocol.java:517)
        at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:385)
        at org.jgroups.JChannel.down(JChannel.java:1231)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:790)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.passDown(MessageDispatcher.java:767)
        at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:693)
        at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:544)
        at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:367)
        at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:777)
        at org.jgroups.JChannel.up(JChannel.java:1091)
        at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:377)
        at org.jgroups.stack.ProtocolStack.receiveUpEvent(ProtocolStack.java:393)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.pbcast.STATE_TRANSFER.up(STATE_TRANSFER.java:158)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.FRAG2.up(FRAG2.java:197)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.FC.up(FC.java:377)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.pbcast.GMS.up(GMS.java:768)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.protocols.pbcast.GMS.receiveUpEvent(GMS.java:788)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:260)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.UNICAST.handleDataReceived(UNICAST.java:476)
      • locked <0xd1eec6d8> (a org.jgroups.protocols.UNICAST$Entry)
        at org.jgroups.protocols.UNICAST.up(UNICAST.java:206)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.pbcast.NAKACK.up(NAKACK.java:569)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:170)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.FD.up(FD.java:300)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:301)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.MERGE2.up(MERGE2.java:162)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.Discovery.up(Discovery.java:225)
        at org.jgroups.stack.Protocol.receiveUpEvent(Protocol.java:488)
        at org.jgroups.stack.Protocol.passUp(Protocol.java:538)
        at org.jgroups.protocols.TP.handleIncomingMessage(TP.java:908)
        at org.jgroups.protocols.TP.handleIncomingPacket(TP.java:850)
        at org.jgroups.protocols.TP.access$400(TP.java:45)
        at org.jgroups.protocols.TP$IncomingPacketHandler.run(TP.java:1296)
        at java.lang.Thread.run(Thread.java:595)

      The thread carried an RPC up to RpcDispatcher and is waiting in FC.handleDownMessage() for credits to become available to send the RPC response. Those credits will never arrive, as the thread that is blocking is the one that would need to deliver the credits.

      This is less of an issue in 2.5. This is because 2.5. uses the concurrent stack and credit replenishments are sent as OOB messages on a separate thread. (Would still be an issue in 2.5 if the concurrent stack were disabled via configuration.)

      Hacky solution is to somehow flag the up thread and in handleDownMessage() check for the flag before blocking the thread. If the flag is set, don't block the thread – just let it through, i.e. let it exceed max_credits.

      In cases like recent JBC releases where the vast majority of RPC responses are lightweight "null" responses, this is pretty safe. Need to add a config flag to disable the workaround though for use in applications where RPC responses frequently return large amounts of data.

              bstansbe@redhat.com Brian Stansberry
              bstansbe@redhat.com Brian Stansberry
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: