-
Bug
-
Resolution: Done
-
Blocker
-
2.6.0.Final
-
None
When re-indexing an entire workspace, the SearchIndexer walks through the content by reading subgraphs, where each subgraph depth is dictated by the "indexReadDepth" repository option. The SearchIndexer code is structured to not hold onto the subgraphs longer than needed, to allow garbage collection to reclaim memory as the indexer walks through the content.
However, the SearchIndexer is inadvertently holding onto all of the requests, preventing them from being garbage collected. Normally this isn't a problem, except for very large repositories or repositories that contain large binary values. The result is an OutOfMemoryError.
The culprit turns out to be the SearchIndexer's use of CompositeRequestChannel, which is actually keeping all processed requests in a list; this is necessary in other situations, like the federation connector, where not a significant number of requests are processed. In the case of the SearchIndexer, the number of requests could be very significant. Since the SearchIndexer never re-processes the requests, in this case the CompositeRequestChannel can be configured to not accumulate the processed requests.