Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1943

Simpler clustered indexing




      When setting up a ModeShape cluster, there are currently (in 3.2) several things that have to be clustered correctly:

      1. Infinispan cache
      2. ModeShape (for events)
      3. Hibernate Search (via ModeShape "query" section of configuration) using either JMS master/slave or JGroups master/slave

      Setting up the Hibernate Search to properly cluster indexes is far more difficult than the other two (which basically involve just wiring each to a JGroups configuration).

      ModeShape should support a third option for clustering: each process maintains its own indexes, and both local and remote changes are reflected in the indexes.

      No change would be required for configuration, because we'd know to use this "fallback" option whenever ModeShape is clustered but Hibernate Search is not clustered (again, via the ModeShape "query" section).

      The benefits are:

      • Configuring clustering becomes much easier, especially in small standalone Java SE applications
      • There is very little lag between the time content changes are persisted (in ISPN) and the time those changes are reflected in the indexes. With JMS/JGroups clustered indexing, the refresh parameter defines how frequently the indexes are copied, and this parameter can't be too small or else there will be a lot of churn (since all of the indexes have to be copied, rather than just the changes).

      There are some drawbacks:

      • After one process makes some changes to content and the events are sent to the other processes, all of the receiving processes must load the nodes that were changed so that the nodes can be indexed. This may create a lot of churn from ISPN, as basically ever process is materializing the recently-changed nodes. (This may also be a good thing, as those materialized nodes are then cached and subsequent access will be quite fast.)
      • Should a process go down, the changes that are persisted while the process is down will not be visible. Thus, prior to restarting the process, the local indexes should probably be removed so that the indexes are completely rebuilt when the process is started. This increases the load on ISPN and the time before the process can correctly respond to queries.

      As long as users are aware of these benefits and drawbacks, they can make an informed decision about which clustering option is more suitable for their own needs.




            hchiorean Horia Chiorean (Inactive)
            rhauch Randall Hauch (Inactive)
            1 Vote for this issue
            3 Start watching this issue