Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-3281

Deadlock in non-transactional caches during rebalance

    Details

      Description

      Say we have a cache with node A and node B joins. The cache topology id is 1, primary_owner(k) = A in the current CH and primary_owner(k) = B in the pending CH.

      1. Node A starts a put(k, v) command during the rebalance. It thinks it's the primary owner, so it acquires the lock locally and it forwards the command to B.
      2. B installs topology 2, primary_owner(k) = B in the current CH, and there is no pending CH.
      3. B receives the put(k, v) command from A. It thinks it's the primary owner, so it acquires the lock locally and it forwards the command to A.
      4. A receives the put(k, v) command from B. Again it thinks it's the primary owner and tries to acquire the lock locally, but it times out because the lock is held by another thread (from step 1).

      I think it may be enough to update the topology id in the put(k, v) command on node B, before forwarding it back to A. That way, the command will block on node A until topology 2 is installed, and it won't try to lock the key again.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                dan.berindei Dan Berindei
                Reporter:
                dan.berindei Dan Berindei
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: