Uploaded image for project: 'ModeShape'
  1. ModeShape
  2. MODE-2022

Tika extractor fails while attempting to extract metadata from images

    XMLWordPrintable

Details

    Description

      If a repository has a tika-text-extractor configured and a binary file representing a Tika-recognized image is uploaded (e.g. JPEG, GIF, BMP), the text extraction fails with:

      5:57:05,750 ERROR [org.modeshape.extractor.tika.TikaTextExtractor] (modeshape-text-extractor-5-thread-1) Error while extracting text : com/drew/metadata/MetadataException: java.lang.NoClassDefFoundError: com/drew/metadata/MetadataException
      	at org.apache.tika.parser.jpeg.JpegParser.parse(JpegParser.java:56)
      	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
      	at org.modeshape.extractor.tika.TikaTextExtractor$1.execute(TikaTextExtractor.java:134)
      	at org.modeshape.jcr.api.text.TextExtractor.processStream(TextExtractor.java:82) [modeshape-jcr-api-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
      	at org.modeshape.extractor.tika.TikaTextExtractor.extractFrom(TikaTextExtractor.java:124)
      	at org.modeshape.jcr.TextExtractors$Worker.run(TextExtractors.java:182) [modeshape-jcr-3.5-SNAPSHOT.jar:3.5-SNAPSHOT]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) [rt.jar:1.6.0_45]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) [rt.jar:1.6.0_45]
      	at java.lang.Thread.run(Thread.java:662) [rt.jar:1.6.0_45]
      Caused by: java.lang.ClassNotFoundException: com.drew.metadata.MetadataException from [Module "org.apache.tika:1.3" from local module loader @3336a1a1 (finder: local module finder @47ad6b4b (roots: d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules,d:\Work\hchiorean.modeshape\integration\modeshape-jbossas-integration-tests\target\jboss-eap-6.1\modules\system\layers\base))]
      	at org.jboss.modules.ModuleClassLoader.findClass(ModuleClassLoader.java:196) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassUnchecked(ConcurrentClassLoader.java:444) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:432) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClassChecked(ConcurrentClassLoader.java:399) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.performLoadClass(ConcurrentClassLoader.java:374) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	at org.jboss.modules.ConcurrentClassLoader.loadClass(ConcurrentClassLoader.java:119) [jboss-modules.jar:1.2.0.Final-redhat-1]
      	... 9 more
      

      This is not only EAP specific, but generic and is caused by the fact that ModeShape's Tika dependency explicitly excludes the 3rd party library which is used to parse images.

      Attachments

        Activity

          People

            hchiorean Horia Chiorean (Inactive)
            hchiorean Horia Chiorean (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: