Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2022

Tika extractor fails while attempting to extract metadata from images

XMLWordPrintable

      If a repository has a tika-text-extractor configured and a binary file representing a Tika-recognized image is uploaded (e.g. JPEG, GIF, BMP), the text extraction fails with:

      5:57:05,750 ERROR [org.modeshape.extractor.tika.TikaTextExtractor] (modeshape-text-extractor-5-thread-1) Error while extracting text : com/drew/metadata/MetadataException: java.lang.NoClassDefFoundError: com/drew/metadata/MetadataException
      	at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.modeshape.extractor.tika.TikaTextExtractor$1.execute(TikaTextExtractor.java:134)
      	at org.modeshape.jcr.api.text.TextExtractor.processStream(TextExtractor.java:82) [modeshape-jcr-api-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
      	at org.modeshape.extractor.tika.TikaTextExtractor.extractFrom(TikaTextExtractor.java:124)
      	at org.modeshape.jcr.TextExtractors$Worker.run(TextExtractors.java:182) [modeshape-jcr-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
      	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
      Caused by: java.lang.ClassNotFoundException: com.drew.metadata.MetadataException from [Module "org.apache.tika:1.3" from local module loader @3336a1a1 (finder: local module finder @47ad6b4b (roots: d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules,d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules\system\layers\base))]
      	at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:196) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:444) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:432) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:399) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:374) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	... 9 more
      

      This is not only EAP specific, but generic and is caused by the fact that ModeShape's Tika dependency explicitly excludes the 3rd party library which is used to parse images.

              hchiorean Horia Chiorean (Inactive)
              hchiorean Horia Chiorean (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: