Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: 3.4
Affects Version/s: 3.3
Labels:
None

Affects:

Release Notes
Estimated Difficulty:
Medium
Workaround:

Workaround Exists
Workaround Description:

Hide

Rollback to 3.3.0.CR1

Show
Rollback to 3.3.0.CR1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We upgraded from 3.3.0.CR1 to 3.3.0.Final and began to experience all sorts of weird lock acquisition issues. The symptoms are:

(a) tryLock() randomly hangs
(b) tryLock(timeout) times out, without acquiring the lock (even though it should, as the lock is only requested from a single node)

This happens both with CENTRAL_LOCK as well as PEER_LOCK. I have attached the configuration we are using.

3.3.0.CR1 worked fine. This bug seems to have been introduced by ~~JGRP-1610~~. I have carefully reviewed the code changes introduced by said fix, and they seems to be:

OOB used for lock messages. This should not be causing problems.
(ii) Use of a striped ReentrantLock table instead of synchronized blocks. By itself, this change alone should not be causing problems.
(iii) Much, much more tightening locking around the server lock table. I think this is where something goes wrong, and deadlocks end up occuring.

The following methods on Locking.java did not even have a synchronized block before, and now they are protected with the striped ReentrantLocks:

handleLockRequest()
handleAwaitRequest()
handleDeleteAwaitRequest()
handleSignalRequest()

This is the major change I see which could introduce deadlocks. Other methods which were already synchronized before (handleCreateLockRequest, handleDeleteLockRequest, handleCreateAwaitingRequest, handleDeleteAwaitingRequest) now are stripe-locked, which should not be the cause of problems.

I would have liked to be able to indicate steps to reproduce, but it is quite random, although the bug is consistent enough that we can see it every single time we deploy our app.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

AbstractJdkLockManager.java
3 kB
2013/05/29 10:08 PM
DistributedJGroupsLockManager.java
3 kB
2013/05/29 10:08 PM
jgroups.xml
3 kB
2013/05/28 7:24 PM
LockingTest.java
1 kB
2013/05/29 10:08 PM

duplicates

JGRP-1639 Locking: lock name with negative hashCode() throws out of bound exception

Resolved

relates to

JGRP-1610 LockingService and rpc on the same cluster, tryLock() hangs

Resolved

Assignee:: Bela Ban

Reporter:: Manuel Dominguez Sarmiento (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2013/05/28 7:21 PM

Updated:: 2013/06/28 10:11 AM

Resolved:: 2013/06/28 10:11 AM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates