-
Bug
-
Resolution: Done
-
Major
-
3.5.0.Final
-
None
If a repository has a tika-text-extractor configured and a binary file representing a Tika-recognized image is uploaded (e.g. JPEG, GIF, BMP), the text extraction fails with:
5:57:05,750 ERROR [org.modeshape.extractor.tika.TikaTextExtractor] (modeshape-text-extractor-5-thread-1) Error while extracting text : com/drew/metadata/MetadataException: java.lang.NoClassDefFoundError: com/drew/metadata/MetadataException at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.modeshape.extractor.tika.TikaTextExtractor$1.execute(TikaTextExtractor.java:134) at org.modeshape.jcr.api.text.TextExtractor.processStream(TextExtractor.java:82) [modeshape-jcr-api-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] at org.modeshape.extractor.tika.TikaTextExtractor.extractFrom(TikaTextExtractor.java:124) at org.modeshape.jcr.TextExtractors$Worker.run(TextExtractors.java:182) [modeshape-jcr-3.5-SNAPSHOT.jar:3.5-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45] at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45] Caused by: java.lang.ClassNotFoundException: com.drew.metadata.MetadataException from [Module "org.apache.tika:1.3" from local module loader @3336a1a1 (finder: local module finder @47ad6b4b (roots: d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules,d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules\system\layers\base))] at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:196) [jboss-modules.jar:1.2.0.Final-redhat-1] at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:444) [jboss-modules.jar:1.2.0.Final-redhat-1] at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:432) [jboss-modules.jar:1.2.0.Final-redhat-1] at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:399) [jboss-modules.jar:1.2.0.Final-redhat-1] at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:374) [jboss-modules.jar:1.2.0.Final-redhat-1] at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119) [jboss-modules.jar:1.2.0.Final-redhat-1] ... 9 more
This is not only EAP specific, but generic and is caused by the fact that ModeShape's Tika dependency explicitly excludes the 3rd party library which is used to parse images.