Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1561

TikaTextExtractor fails if extracted word count exceeds 100000 chars

XMLWordPrintable

      The TikaTextExtractor uses the default value for its ContentHandler. This default value is limited to 100000 characters which is way to low to extract words from even mid-size documents (2.5MB). Please increase the default size or make it configurable in the repository configuration file.

      Also please use logging facility to report any parser problems.

              hchiorean Horia Chiorean (Inactive)
              nl_jira Niels Lippke (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: