Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-8980

High concurrency : Infinispan Directory Provider: Lucene : Error loading metadata for index file

    Details

    • Type: Bug
    • Status: New (View Workflow)
    • Priority: Critical
    • Resolution: Unresolved
    • Affects Version/s: 8.2.5.Final
    • Fix Version/s: None
    • Component/s: Lucene Directory
    • Labels:
      None

      Description

      During high concurrency of action, we are getting 'Error loading metadata for index file' even in Non-Clustered env.

      Hibernate Search Indexes (Lucene Indexes) - 5.7.0.Final
      Infinispan - 8.2.5.Final
      infinispan-directory-provider-8.2.5.Final
      jgroups-3.6.7.Final

      Worker Backend : JGroups
      Worker Execution: Sync
      write_metadata_async: false (implicitly)

      Note: Currently we are on Non-Clustered env. We are moving to Clustered Env within few days.

      On analyzing the code, and putting some additional SYSOUT loggers into FileListOperations and DirectoryImplementor classes, we have established the following points:

      1. This is happening during high concurrency on non-clustered env.
      2. One thread 'T1' is deleting a segment and segment name 'SEG1' from the 'FileListCacheKey' list* stored in MetaDatacache*.
      3. Concurrently, at the same time, another thread 'T2' is looping through the FileList ['copy list' from MetadataCache - for -FileListCacheKey - provided by toArray method of *FileListOperations* (changes also being done in the corresponding original list by T1 thread) ].
      4. 'T2' is calling open input method on each segment name - getting corresponding Metadata segment from MetadataCache.
      5. However, for 'T2', the 'copy list' still contains the name of segment 'SEG1'.
      6. So while looping through the list, 'T2' tries to get Segment from MetadataCache for segment name 'SEG1'.
      7. But at this instant, segment corresponding to segment name 'SEG1', has been already removed from MetadataCache by 'T1'.
      8. This results in 'java.io.FileNotFoundException: Error loading metadata for index file' for segment name 'SEG1'
      9. As mentioned earlier, this happens more often during high concurrency.

      On a standalone server (non-clustered), we are getting below error intermittently:
      Full Stack trace:

      2018-03-19 17:29:11,938 ERROR [Hibernate Search sync consumer thread for index com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer] o.h.s.e.i.LogErrorHandler [LogErrorHandler.java:69]
      HSEARCH000058: Exception occurred java.io.FileNotFoundException: Error loading metadata for index file: M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
      Primary Failure:
      Entity com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer Id 1649990024999813056 Work Type org.hibernate.search.backend.AddLuceneWork

      java.io.FileNotFoundException: Error loading metadata for index file: M|segments_w6|com.nucleus.integration.ws.server.globalcustomer.entity.GlobalCustomer|-1
      at org.infinispan.lucene.impl.DirectoryImplementor.openInput(DirectoryImplementor.java:138) ~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
      at org.infinispan.lucene.impl.DirectoryLucene.openInput(DirectoryLucene.java:102) ~[infinispan-lucene-directory-8.2.5.Final.jar:8.2.5.Final]
      at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
      at org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos.java:294) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
      at org.apache.lucene.index.IndexFileDeleter.<init>(IndexFileDeleter.java:171) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
      at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:949) ~[lucene-core-5.5.4.jar:5.5.4 31012120ebbd93744753eb37f1dbc5e654628291 - jpountz - 2017-02-08 19:08:03]
      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.createNewIndexWriter(IndexWriterHolder.java:126) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.IndexWriterHolder.getIndexWriter(IndexWriterHolder.java:92) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.AbstractCommitPolicy.getIndexWriter(AbstractCommitPolicy.java:33) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.SharedIndexCommitPolicy.getIndexWriter(SharedIndexCommitPolicy.java:77) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.SharedIndexWorkspaceImpl.getIndexWriter(SharedIndexWorkspaceImpl.java:36) ~[hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.AbstractWorkspaceImpl.getIndexWriterDelegate(AbstractWorkspaceImpl.java:203) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.applyUpdates(LuceneBackendQueueTask.java:81) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.LuceneBackendQueueTask.run(LuceneBackendQueueTask.java:46) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.applyChangesets(SyncWorkProcessor.java:165) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at org.hibernate.search.backend.impl.lucene.SyncWorkProcessor$Consumer.run(SyncWorkProcessor.java:151) [hibernate-search-engine-5.7.0.Final.jar:5.7.0.Final]
      at java.lang.Thread.run(Thread.java:785) [na:1.8.0-internal]

      As per our understanding, this issue should not come in 'non-clustered' env. Also it should not arise when worker execution is 'sync'.
      We have debugged the code, and confirmed that the value for 'write_metadata_async' is coming as 'false' only (as expected).

        Gliffy Diagrams

          Attachments

          1. JoiningNode_N2.zip
            439 kB
          2. neutrino-hibernatesearch-infinispan.xml
            4 kB
          3. neutrino-hibernate-search-worker-jgroups.xml
            3 kB
          4. OriginalNode_N1.zip
            208 kB
          5. SysOutLogs.txt
            4 kB
          6. TestResult-StoreIndexReplTest.txt
            9 kB

            Issue Links

              Activity

                People

                • Assignee:
                  gustavonalle Gustavo Fernandes
                  Reporter:
                  debashish.bharali Debashish Bharali
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  5 Start watching this issue

                  Dates

                  • Created:
                    Updated: