[MODE-2662] Stale cache results in the inability to lock the node - Red Hat Issue Tracker

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: 5.4.0.Final
Affects Version/s: 5.3.0.Final
Component/s: JCR
Labels:
None

Steps to Reproduce:

Hide

Refer to the "Steps to Reproduce" section in the description of the JIRA.

Show
Refer to the "Steps to Reproduce" section in the description of the JIRA.

Steps to Reproduce

Checkout project [1] and invoke:

mvn clean verify -Dtest=LockingBehaviorTest#addNodesInOrderNoTransaction

Background

Let us assume that we have a cluster that consists of 10 members. Using a single thread of execution, consider adding 100 child nodes to the appRoot, so that repository structure looks like this:

- jcrRepositoryRoot
  -- appRoot
       --- childNode1
       ...
       --- childNode100

Where:

The appRoot and childNodeN are versioned.
Every time a new child is about to be added:
- Lock parent node.
- Add child.
- Unlock parent node.

To simulate load balancing, e.g. Round-Robin, every time a request to handle addition of the new node comes in, the next available member of the cluster gets picked up (there is a kind of a circular iterator that gives back next available member). For instance (3 members of the cluster and 4 nodes to add):

childNode1 -> member1
childNode2 -> member2
childNode3 -> member3
childNode4 -> member1

Problem

At some point during the creation of the nodes, the LockException gets thrown. The exception indicates that the parent node is locked, therefore a new node cannot be added. It can be happen on any node, i.e. the order is not deterministic, but the exception happens consistently. In my understanding, this should not be happening, unless there is a bug in ModeShape or some misconfiguration of the JGroups.

Questions

1. With a single thread of execution and sequential successful lock/unlock operations, how is it possible for the parent node to remain locked on the next attempt to add a child node? Could it be that message delivery from one member of the cluster to others is too slow? To exemplify:

First member of the cluster:

Lock parent node, send notifications.
Add new child node, save, send notifications.
Unlock parent node, send notifications.

Second member of the cluster:

Attempt to lock parent node. However, the notification about node unlocking from the first member of the cluster has not arrived yet and the local cache indicates a locked status, therefore throw a LockException.

2. Is it a problem with ModeShape or the the custom JGroups file [2], which is simply misconfigured in one way or another?

[1] https://github.com/dnillia/modeshape-cluster-test
[2] https://github.com/dnillia/modeshape-cluster-test/blob/master/src/test/resources/test-jgroups.xml

Assignee:: Horia Chiorean (Inactive)

Reporter:: Illia Khokholkov (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2017/01/20 5:12 PM

Updated:: 2017/02/06 4:19 AM

Resolved:: 2017/02/06 4:19 AM

Details

Description

Steps to Reproduce

Background

Problem

Questions

Attachments

Easy Agile Planning Poker

Activity

People

Dates