-
Feature Request
-
Resolution: Done
-
Blocker
-
3.5.0.Final
-
None
ModeShape 3.x maintains a single Lucene index for each repository, but there are several issues with this:
- In a clustered deployment, maintaining and synchronizing these different Lucene indexes is quite difficult, error prone, complex, and inconsistent (since the master content is transferred to the slaves only periodically).
- The Lucene index is used for querying all fields, even when the values and criteria don't involve search (e.g., numeric fields, exact text matches, pattern matches, etc.). Lucene is not ideal for these kinds of queries, whereas traditional indexes (e.g., based upon B*-tree or similar) would be far more efficient and effective.
- Using a single Lucene index for a whole repository is far from ideal, and it leads to concurrency problems (even in a local, non-clustered case).
- When nodes are changed, the whole document in Lucene must be updated. This means we can't really update the Lucene index with just what's changed, and thus updating the index requires accessing the node rather than just working from the events.
- Using Lucene (and Hibernate Search) adds a number of dependencies and complicates the build process, especially for the EAP kit.
- We're currently indexing all properties. Doing so does mean that users can use any properties in their criteria, but it also means that the indexes are large and updating/replicating them takes longer. Ideally, we can offer the ability to index only specific properties that are actually used in query criteria. Doing this with Lucene would be difficult.
- When a process in a cluster leaves the cluster (e.g., is taken down) and then (re)joins the cluster, ModeShape has no option other than to completely reindex the content (or, if master-slave is used copy the indexes, though this copying leads to other inconsistencies).
The objective of this feature is to replace the query engine with one that can use explicitly-defined indexes defined by administrators. The query engine should even work when no indexes are defined, though it will be slower (potentially a lot slower) than if proper indexes are defined for a query. And like a regular relational database, which indexes you define will depend heavily on the queries you are using.
Additionally, indexes should be able to be stored/accessed using several "index provider" mechanisms, including:
- "internal" indexes (e.g., local files via MapDB; see
MODE-2160) - local file-system-based indexes using Lucene
- indexes in Solr
- indexes in ElasticSearch
Using explicitly-defined indexes would perform a lot better as we'd only be indexing the information that needs to be indexed rather than all of the content, as we do with 3.x. Plus this will make clustering easier, since it (along with the journal service) make it far easier to bring a process up and update the indexes after a process has been out of the cluster for a period of time.
- blocks
-
MODE-1869 Embed the Teiid relational query engine
- Resolved
-
MODE-2159 Store indexes in local Lucene
- Resolved
-
MODE-2160 Store indexes on the local file system (using MapDB)
- Resolved
-
MODE-2162 Store indexes in ElasticSearch
- Resolved
-
MODE-2166 Allow cast of dynamic operand in queries
- Resolved
-
MODE-2161 Store indexes in Solr
- Open
-
MODE-1903 Rebuild indexes from a point in time
- Resolved
-
MODE-2184 Add public API methods to close query results
- Resolved
-
MODE-2157 The configuration option to disable queries is not needed and should be removed
- Resolved
- incorporates
-
MODE-2151 Add JCR-SQL2 dynamic operand for number of child nodes
- Resolved
-
MODE-2163 Add support in JCR-SQL2 for ordering with nulls first or last
- Resolved
- is blocked by
-
MODE-1372 Internal RepositoryCache CreateNode events should include node type info for new node and parent
- Resolved
-
MODE-2023 Refactor query and indexing functionality behind interfaces
- Resolved
- relates to
-
MODE-2159 Store indexes in local Lucene
- Resolved
-
MODE-1671 Provide a way to specify a node's identifier in queries
- Resolved
-
MODE-2138 JCR-SQL2 query throws NPE if ordering by [jcr:path]
- Resolved