Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2617

Exception on concurrent write operations for versionable nodes in a clustered mode

    Details

    • Type: Bug
    • Status: Resolved (View Workflow)
    • Priority: Major
    • Resolution: Done
    • Affects Version/s: 5.1.0.Final
    • Fix Version/s: 5.2.0.Final
    • Component/s: JCR
    • Labels:
      None

      Description

      Setup

      Consider the following layout where all nodes except for <jcrRepositoryRoot> are versionable:

      <jcrRepositoryRoot>
      - <applicationRoot>
      -- <parentNode1>
      --- <childNode1>
      -- <parentNode2>
      --- <childNode2>
      
      ...
      
      -- <parentNodeN>
      --- <childNodeN>
      

      The ModeShape is running in a clustered mode and there are two members in JGroups cluster (there is a custom JGroups configuration file, which is used in the test project [1]). The members can reside either in a single JVM (as multiple javax.jcr.Repository instances deployed to the org.modeshape.jcr.ModeShapeEngine) or in multiple JVMs (i.e. one javax.jcr.Repository per JVM).

      Problem

      Say we have N java.util.concurrent.Callable<V> instances. When called, each one obtains a javax.jcr.Session from the first member of the cluster and updates a single <childNodeN> which exists under corresponding <parentNodeN> (i.e. no two instances are configured to update the same child or parent node). Now, an java.util.concurrent.ExecutorService gets to invoke all previously created tasks in parallel using N threads in the pool, which results in an exception that looks like this:

      Caused by: org.modeshape.jcr.cache.NodeNotFoundInParentException: Cannot locate child node: 497b1b6317f1e7209d1359-9d56-4a28-8dd0-59749b1ea35c within parent: 497b1b6317f1e7135adb68-36ac-4057-a084-650ce1cf274a
      	at org.modeshape.jcr.cache.document.SessionNode.getSegment(SessionNode.java:435)
      	at org.modeshape.jcr.cache.document.SessionNode.getPath(SessionNode.java:466)
      	at org.modeshape.jcr.JcrSession.node(JcrSession.java:553)
      	at org.modeshape.jcr.JcrSession.node(JcrSession.java:518)
      	at org.modeshape.jcr.JcrVersionManager.checkin(JcrVersionManager.java:378)
      	at org.modeshape.jcr.JcrVersionManager.checkin(JcrVersionManager.java:295)
      

      The problem does not happen if:

      • there is only one member in the cluster
      • having two cluster members, only one thread of execution performs the update using the first member of the cluster

      Note, that any kind of write operation results in the same error (every child node is unique and exists under a unique parent):

      • updating all child nodes in parallel
      • creating all child nodes in parallel

      Possible Cause

      Potentially, the JGroups configuration file used to recreate the problem has some invalid configuration options. For what it is worth, when TRACE log level is enabled, I can see that members of the cluster communicate with each other, i.e. there are send/receive entries.

      How to Recreate the Failure

      I have created a test project [1] that can be used to consistently cause the previously described problem. Please, read the documentation [2] that should explain how to use the tools available in the project and it should also provide a better explanation of the problem and exact steps to take to reproduce it.

      [1] https://github.com/dnillia/modeshape-cluster-test
      [2] https://github.com/dnillia/modeshape-cluster-test/blob/master/README.md

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                hchiorean Horia Chiorean
                Reporter:
                illia.khokholkov Illia Khokholkov
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: