Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 5.4.0.Final
Affects Version/s: 5.2.0.Final
Component/s: JCR
Labels:
None

Steps to Reproduce:
Hide

I used 15 threads running the lock()... unlock() code posted in the description against 6 instances of the JCR repository on an Oracle database. It is difficult to reproduce, and requires the following conditions to be met:

two threads to race to call RepositoryLockManager.lock(...)

one thread succeeds, performs work, and attempts to call RepositoryLockManager.unlock(...)

the second thread then attempts to retry to acquire the lock by calling RepositoryLockManager.lock(...)
Show
I used 15 threads running the lock()... unlock() code posted in the description against 6 instances of the JCR repository on an Oracle database. It is difficult to reproduce, and requires the following conditions to be met: two threads to race to call RepositoryLockManager.lock(...) one thread succeeds, performs work, and attempts to call RepositoryLockManager.unlock(...) the second thread then attempts to retry to acquire the lock by calling RepositoryLockManager.lock(...)
Git Pull Request:
https://github.com/ModeShape/modeshape/pull/1625

Description

I have multiple threads running the code below, calling into a cluster of the ModeShape repositories on an Oracle database:

session.getWorkspace().getLockManager().lock(parent.getPath(), ...);
try {
  parent.addNode("child-" + i);
} finally {
  session.getWorkspace().getLockManager().unlock(parent.getPath());
}

I noticed the database deadlock when I was investigating hung threads. After going through the logs from the WritableSessionCache, it looks like the order of the database locks is determined by RepositoryLockManager.lock(...) and RepositoryLockManager.unlock(...)), and are contributing to the database deadlock.

Here's the relevant output from WritableSessionCache's logging that demonstrates the deadlock:

2017-01-31 18:26:09,438-0600 [add-node-parallel-10] DEBUG org.modeshape.jcr.cache.document.WritableSessionCache - Locked the nodes: [e60d4aa317f1e7mode:lock-e60d4aa7505d64e7751d4d-ff9a-45d3-8453-c51e87e4df36, e60d4aa317f1e7mode:locks]
...
2017-01-31 18:26:09,439-0600 [add-node-parallel-6] DEBUG org.modeshape.jcr.cache.document.WritableSessionCache - Locked the nodes: [e60d4aa7505d64e7751d4d-ff9a-45d3-8453-c51e87e4df36]
...
2017-01-31 18:26:09,448-0600 [add-node-parallel-10] DEBUG org.modeshape.jcr.cache.document.WritableSessionCache - Attempting to the lock nodes: [e60d4aa7505d64e7751d4d-ff9a-45d3-8453-c51e87e4df36]
...
2017-01-31 18:26:09,449-0600 [add-node-parallel-6] DEBUG org.modeshape.jcr.cache.document.WritableSessionCache - Attempting to the lock nodes: [e60d4aa317f1e7mode:locks, e60d4aa317f1e7mode:lock-e60d4aa7505d64e7751d4d-ff9a-45d3-8453-c51e87e4df36]
...

From looking through the source code in RepositoryLockManager, the lock(...) method saves the lock in the system session first, then saves the node being lock.

While unlock(...) saves the node being locked first, then deletes the lock itself.

This means that when lock(...) and unlock(...) are called at around the same time, lock(...) would acquire a db lock for the JCR lock itself, and waits for the db lock to be released for the JCR node being locked.
unlock(...) would hold a db lock for the JCR node being locked and waits to acquire a db lock for the JCR lock itself.

Proposed solution:
I believe that changing the save order in RepositoryLockManager's unlock(...) would resolve the deadlock. I forked ModeShape, made the change and wasn't seeing any more deadlocks or hung threads. I don't know enough context around why the save order is the way it is, so I am not sure if this is the best/correct solution.

Attachments

Activity

People

Assignee:: Horia Chiorean (Inactive)

Reporter:: Reggie Y (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2017/02/03 4:12 PM

Updated:: 2017/02/06 9:40 AM

Resolved:: 2017/02/06 9:40 AM