-
Bug
-
Resolution: Won't Do
-
Major
-
5.3.0.Final
-
None
Steps to Reproduce
Checkout project [1] and invoke:
mvn clean verify -Dtest=LockingBehaviorTest#addNodesInOrderNoTransaction
Background
Let us assume that we have a cluster that consists of 10 members. Using a single thread of execution, consider adding 100 child nodes to the appRoot, so that repository structure looks like this:
- jcrRepositoryRoot -- appRoot --- childNode1 ... --- childNode100
Where:
- The appRoot and childNodeN are versioned.
- Every time a new child is about to be added:
- Lock parent node.
- Add child.
- Unlock parent node.
To simulate load balancing, e.g. Round-Robin, every time a request to handle addition of the new node comes in, the next available member of the cluster gets picked up (there is a kind of a circular iterator that gives back next available member). For instance (3 members of the cluster and 4 nodes to add):
childNode1 -> member1 childNode2 -> member2 childNode3 -> member3 childNode4 -> member1
Problem
At some point during the creation of the nodes, the LockException gets thrown. The exception indicates that the parent node is locked, therefore a new node cannot be added. It can be happen on any node, i.e. the order is not deterministic, but the exception happens consistently. In my understanding, this should not be happening, unless there is a bug in ModeShape or some misconfiguration of the JGroups.
Questions
1. With a single thread of execution and sequential successful lock/unlock operations, how is it possible for the parent node to remain locked on the next attempt to add a child node? Could it be that message delivery from one member of the cluster to others is too slow? To exemplify:
First member of the cluster:
- Lock parent node, send notifications.
- Add new child node, save, send notifications.
- Unlock parent node, send notifications.
Second member of the cluster:
- Attempt to lock parent node. However, the notification about node unlocking from the first member of the cluster has not arrived yet and the local cache indicates a locked status, therefore throw a LockException.
2. Is it a problem with ModeShape or the the custom JGroups file [2], which is simply misconfigured in one way or another?
[1] https://github.com/dnillia/modeshape-cluster-test
[2] https://github.com/dnillia/modeshape-cluster-test/blob/master/src/test/resources/test-jgroups.xml