Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2316

Distributed deadlock in StateTransferInterceptor


      When using transactions, a distributed deadlock may occur when a node is joining under these circumstances:

      1) the new node requests transactions using GET_TRANSACTIONS
      2) the old node tries to commit a transaction, broadcasting PrepareCommand - in StateTransferIntreceptor it locks the transactionLock in shared way
      3) the request GET_TRANSACTIONS comes on the new node, the node is waiting for the transactionLock (it requires it exclusively)
      4) transaction commit on new node is waiting for the commandsLock (requires this in shared way) but it is locked exclusively by the onTopologyUpdate - addTransfer - requestTransactions ( = synchronous GET_TRANSACTIONS).

      Found in some traces, but not required:
      After the transaction commit times out on old node releasing the lock, the GET_TRANSACTION request may continue, but the state transfer itself can also timeout if not set properly longer.
      The transaction commit continues on the new node after the ST times out, until it is found invalid (rolled back).

            dberinde@redhat.com Dan Berindei (Inactive)
            rvansa1@redhat.com Radim Vansa (Inactive)
            0 Vote for this issue
            3 Start watching this issue
