Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 2.8.3.Final
Affects Version/s: 2.8.1.GA, 2.8.2.Final
Component/s: None
Labels:
None

Git Pull Request:
https://github.com/ModeShape/modeshape/pull/491

When configuring a repository with a MsOfficeSequencer and a TikaTextExtractor, the tika extractor cannot extract content from MsOffice files.

This is caused by the fact that the MsOffice sequencer enforces an apache-poi dependency version of 3.7, while the tika-parsers_1.0 library needs at least a beta version of 3.8 to be able to extract content from office documents. (this uses the NPOIFS* classes from POI, which aren't present in 3.7)

The downside of this, is that the error is well hidden, because any potential problems during text extraction (see org.modeshape.search.lucene.LuceneSearchSession) are silently ignored.

Assignee:: Horia Chiorean (Inactive)

Reporter:: Horia Chiorean (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2012/07/18 6:47 AM

Updated:: 2013/10/24 5:20 AM

Resolved:: 2012/08/24 9:28 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates