Uploaded image for project: 'eXo-JCR'
  1. eXo-JCR
  2. EXOJCR-1713

Parallel text extractions doesn't work with clustered index strategies

    Details

      Description

      Usecase is the following:

      • There is a save() containing multiple structured documents (like PDF, ODT and ect);
      • When saving into index, JCR creates a thread pool executor, that extracts text from documents in parallel.

      But, when replicated volatile index was introduced, features stopped working. Root of it is a serialization of Lucene Documents. The last one retrieves the content of Lucene Documents invoking text extraction. This cycle performed before Lucene Document is being placed into index, before thread pool executor used. So there is no performance increase made by thread pool.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                tolusha Anatolii Bazko
                Reporter:
                nzamosenchuk Nikolazy Zamosenchuk
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - 3 days
                  3d
                  Remaining:
                  Time Spent - 1 day Remaining Estimate - 2 days
                  2d
                  Logged:
                  Time Spent - 1 day Remaining Estimate - 2 days
                  1d