Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-1561

TikaTextExtractor fails if extracted word count exceeds 100000 chars

    Details

      Description

      The TikaTextExtractor uses the default value for its ContentHandler. This default value is limited to 100000 characters which is way to low to extract words from even mid-size documents (2.5MB). Please increase the default size or make it configurable in the repository configuration file.

      Also please use logging facility to report any parser problems.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  hchiorean Horia Chiorean
                  Reporter:
                  nl Niels Lippke
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: