Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1610

LockingService and rpc on the same cluster, tryLock() hangs

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Done
    • Icon: Major Major
    • 3.3
    • None
    • None

      Hi,

      Yes, the sequence diagram only depicted the second part of my description.

      Anyway, I've attached a test file that reproduce the problem.

      It contains two test cases, one where the coordinator of the lock is the one who
      sends the message first, and a second case where the non-coordinator sends
      the message first.

      In the first case the receiver, non-coordinator, will hang in tryLock. In the second
      case though, everything works fine.

      Regards,
      Daniel Olausson

      On 25 March 2013 16:15, Bela Ban <belaban@yahoo.com> wrote:

      Hi Daniel,

      the sequence diagram differs from your description, can you submit a
      test case (e.g. copy MessageDispatcherRSVPTest and modify it), so I can
      take a look ?

      I assume your RPCs are blocking (sync) and non-OOB ? Could be a
      recursive invocation, where FIFO order (default) leads to a distributed
      deadlock.

      A test case would clarify what you want to do, and if I can reproduce
      the problem, I can fix it.

      On 3/25/13 1:54 PM, Daniel Olausson wrote:
      > Hi,
      >
      > We trying to use the same channel for our lockingService and
      > rpcDispatcher. But we are noticing some weird behavior.
      >
      > The end result is that lock.tryLock(lockName) never returns, which it
      > should always do.
      >
      > This happens when we do the following:
      >
      > On computer A, we lock the lock.
      > Do a rpc to a function on computer B, this function tries to take the
      > lock(lock.tryLock(lockName)), but it can't because the lock is locked.
      > This is correct behavior.
      > Computer A unlocks the lock.
      >
      > On computer B we now do the same procedure, we lock the lock and do a
      > rpc to computer A, but here is when the strange thing happens. Computer
      > A tries to take the lock by executing tryLock, but it never returns.
      >
      > Here is a sequence diagram:
      > http://www.websequencediagrams.com/cgi-bin/cdraw?lz=dGl0bGUgQXV0aGVudGljYXRpb24gU2VxdWVuY2UKCkNoYW5uZWwgMSAtPiAABAk6IGNlbnRyYWxMb2NrLnRyeUxvY2soKQAiDS0-KwAoCTI6IHJwY0Rpc3BhdGhlci5jYWxsbWV0aG9kKCJmb28iKQBfCTIAXAwyAFAYbm90ZSByaWdodCBvZiAiAF4JIjogAIEECSBibG9ja3MgZm9yZXZlcgBWDC0-LQCBQAxmb28gcmV0dXJucwoK&s=default
      >
      >
      > In this example we use the standard udp.xml with <CENTRAL_LOCK/> added
      > on the top of the stack. Everything works if we use PEER_LOCK but then
      > we need the messages to arrive in the same order everywhere, e.g. atomic
      > broadcast.
      >
      > It also works if we use different clusters for locking and rpc, but it
      > would be convenient if we could use the same cluster.
      >
      >
      > Is it recommended to use the same channel for different services?
      >

        1. RpcLockingTest.java
          10 kB
          Bela Ban
        2. RpcLockingTest.java
          11 kB
          Bela Ban

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: