Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1561

TikaTextExtractor fails if extracted word count exceeds 100000 chars

    XMLWordPrintable

Details

    Description

      The TikaTextExtractor uses the default value for its ContentHandler. This default value is limited to 100000 characters which is way to low to extract words from even mid-size documents (2.5MB). Please increase the default size or make it configurable in the repository configuration file.

      Also please use logging facility to report any parser problems.

      Attachments

        Issue Links

          Activity

            People

              hchiorean Horia Chiorean (Inactive)
              nl_jira Niels Lippke (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: