Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-6350

Data race in the ShardIndexManager under topology changes

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      The following example data race can cause unrecoverable errors during indexing:

      [node1] cache.put(key) // key maps to segment 48, owned by node1
      [node1] starts shard 48
      [node1] acquires lock on shard 48
      [node1] starts writing to the index
      [node1] notification of topology changed, lock released on shard 48
      [node1] lock reacquired (still writing to the index)
      [node1] commit on shard 48
      [node1] shard still locked
      [node2] cache.put(key) // Node2 now owns segment 48
      [node2] starts shard 48
      [node2] tries to acquire the lock on shard 48
      [node2] fail (lock still owned by node1)

      The current mechanism employed by the ShardIndexManager during topology changes involves using a listener and closing the IndexWriter on all nodes upon ownership changes, so that the lock is released and can be reacquired by the new owner (1 segment maps to 1 shard).
      Since writing to a shard can take some time, the listener can be triggered in the middle of an index operation and the closing of the index writer will have a very short duration because it is sudden reacquired, and not released anymore.

              gfernand@redhat.com Gustavo Fernandes (Inactive)
              gfernand@redhat.com Gustavo Fernandes (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: