Details
-
Feature Request
-
Resolution: Done
-
Major
-
None
-
None
Description
JDG currently works with two types of indexes:
1) single index cluster wide, with indexes stored in caches
2) replicated indexes on each node, with either ram or filesystem indexes in each node
Both strategies have issues regarding scalability since the index must be present in the node that is querying. Furthermore, 2) only works for REPL caches, leaving 1) with the only supported strategy for DIST caches.
Strategy 1) has two issues: only one node in the cluster is responsible to do all the indexing, and furthermore it expects the (global) index to be accessible in the node where the query is done. If the query requires reading an index segment that is not local, the query engine will fetch it in order to run the query, causing high latency due to the amount of RPC and data transferred.
The broadcast query feature [1] allows each node to index its own data during writes, and at query time, it sends the query to each node. An extra step is required to combine the results from all nodes. This is ideal for DIST caches with large indexes since the amount of data transferred is the query itself and the results.
[1] http://infinispan.org/docs/stable/user_guide/user_guide.html#query.clustered-query-api
Attachments
Issue Links
- is blocked by
-
ISPN-6395 Unify clustered queries with non clustered queries
- Closed