Loading...

XML

Word

Printable

Details

Type: Feature Request
Resolution: Done
Priority: Blocker
Fix Version/s: 4.0.0.Alpha1
Affects Version/s: 3.5.0.Final
Component/s: Clustering, JCR, Query
Labels:
None

Git Pull Request:
https://github.com/ModeShape/modeshape/pull/1077

Description

ModeShape 3.x maintains a single Lucene index for each repository, but there are several issues with this:

In a clustered deployment, maintaining and synchronizing these different Lucene indexes is quite difficult, error prone, complex, and inconsistent (since the master content is transferred to the slaves only periodically).
The Lucene index is used for querying all fields, even when the values and criteria don't involve search (e.g., numeric fields, exact text matches, pattern matches, etc.). Lucene is not ideal for these kinds of queries, whereas traditional indexes (e.g., based upon B*-tree or similar) would be far more efficient and effective.
Using a single Lucene index for a whole repository is far from ideal, and it leads to concurrency problems (even in a local, non-clustered case).
When nodes are changed, the whole document in Lucene must be updated. This means we can't really update the Lucene index with just what's changed, and thus updating the index requires accessing the node rather than just working from the events.
Using Lucene (and Hibernate Search) adds a number of dependencies and complicates the build process, especially for the EAP kit.
We're currently indexing all properties. Doing so does mean that users can use any properties in their criteria, but it also means that the indexes are large and updating/replicating them takes longer. Ideally, we can offer the ability to index only specific properties that are actually used in query criteria. Doing this with Lucene would be difficult.
When a process in a cluster leaves the cluster (e.g., is taken down) and then (re)joins the cluster, ModeShape has no option other than to completely reindex the content (or, if master-slave is used copy the indexes, though this copying leads to other inconsistencies).

The objective of this feature is to replace the query engine with one that can use explicitly-defined indexes defined by administrators. The query engine should even work when no indexes are defined, though it will be slower (potentially a lot slower) than if proper indexes are defined for a query. And like a regular relational database, which indexes you define will depend heavily on the queries you are using.

Additionally, indexes should be able to be stored/accessed using several "index provider" mechanisms, including:

"internal" indexes (e.g., local files via MapDB; see ~~MODE-2160~~)
local file-system-based indexes using Lucene
indexes in Solr
indexes in ElasticSearch

Using explicitly-defined indexes would perform a lot better as we'd only be indexing the information that needs to be indexed rather than all of the content, as we do with 3.x. Plus this will make clustering easier, since it (along with the journal service) make it far easier to bring a process up and update the indexes after a process has been out of the cluster for a period of time.

Attachments

Issue Links

blocks

MODE-1869 Embed the Teiid relational query engine

Resolved

MODE-2159 Store indexes in local Lucene

Resolved

MODE-2160 Store indexes on the local file system (using MapDB)

Resolved

MODE-2162 Store indexes in ElasticSearch

Resolved

MODE-2166 Allow cast of dynamic operand in queries

Resolved

MODE-2161 Store indexes in Solr

Open

MODE-1903 Rebuild indexes from a point in time

Resolved

MODE-2184 Add public API methods to close query results

Resolved

MODE-2157 The configuration option to disable queries is not needed and should be removed

Resolved

incorporates

MODE-2151 Add JCR-SQL2 dynamic operand for number of child nodes

Resolved

MODE-2163 Add support in JCR-SQL2 for ordering with nulls first or last

Resolved

is blocked by

MODE-1372 Internal RepositoryCache CreateNode events should include node type info for new node and parent

Resolved

MODE-2023 Refactor query and indexing functionality behind interfaces

Resolved

relates to

MODE-2159 Store indexes in local Lucene

Resolved

MODE-1671 Provide a way to specify a node's identifier in queries

Resolved

MODE-2138 JCR-SQL2 query throws NPE if ordering by [jcr:path]

Resolved

(4 blocks, 2 incorporates, 2 is blocked by, 3 relates to)

Activity

People

Assignee:: Randall Hauch (Inactive)

Reporter:: Randall Hauch (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2013/08/29 10:40 AM

Updated:: 2020/09/14 5:18 AM

Resolved:: 2014/03/26 10:42 AM