Loading...

XML

Word

Printable

Details

Type: Enhancement
Resolution: Done
Priority: Major
Fix Version/s: 5.4.0.Final
Affects Version/s: 5.3.0.Final
Component/s: Indexing
Labels:
None

Forum Reference:
https://developer.jboss.org/message/967732
Git Pull Request:
https://github.com/ModeShape/modeshape/pull/1627

Description

Using the lucene index provider and looking at performance, I was noticing that the query performance for VALUE indexes on simple string properties was not performing well. I threw together a small test application which inserts a 1000 nodes into a type with an index on a single string field populating the field with a unique sequence of values so that the cardinality should be really high. The test then does a 1000 searches using that field as the constraint with a random value. Test java code, modeshape config, and CND attached. The results were as followed (Modeshape 5.3.0, Windows 7 x64, Java 1.8.0_91) :

Inserted 1000 in 580 ms.
Searched 1000 nodes 1000 times in 57314 ms.
Deleted 1000 in 25 ms.

Throwing this under a profiler, all of the time is spent in IndexReader.document call within ConstantScoreWeightQuery.java. Looking at this code it seems that this query is basically doing a linear search of the index and forcing lucene to instantiate a full document for each entry. Following that logic and digging into the code, I changed the EQUAL_TO case in LuceneQueryFactory.stringFieldQuery from:
case EQUAL_TO:
return CompareStringQuery.createQueryForNodesWithFieldEqualTo(stringValue, field, factories, caseOperation);
to just using the build in Lucene TermQuery:
case EQUAL_TO:
return new TermQuery(new Term(field, stringValue));

The results running with this change are:
Inserted 1000 in 627 ms.
Searched 1000 nodes 1000 times in 1327 ms.
Deleted 1000 in 24 ms.

So a 40x improvement, which seems pretty good, and at least from other testing seems to provide correct results.

CompareStringQuery looks like it might be necessary for implementing things like regular expression matching which are not implemented inherently by lucene, but the simple string equality case seems like it should be devolved onto Lucene.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

modeshape-index-test.zip
124 kB
2017/01/20 10:18 AM

Activity

People

Assignee:: Horia Chiorean (Inactive)

Reporter:: Matthew Bachmann (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2017/01/20 10:18 AM

Updated:: 2017/02/14 1:27 AM

Resolved:: 2017/02/14 1:27 AM