Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2966

NBST: Concurrent leavers can lead to deadlock

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      This sequence of events, leads to a thread deadlock in the coordinator

      1) NodeF sends LEAVE message. new topologyId=8
      2) NodeE delivers REBALANCE_START(8)
      3) NodeF and NodeG delivers REBALANCE_START(8)
      4) NodeH delivers GET_TRANSACTION(8) from NodeE ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeE-28744 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      5) NodeH sends LEAVE message. new topologyId=9
      6) NodeH delivers REBALANCE_START(8) ==> Ignoring rebalance 8 for cache dist that doesn't exist locally
      7) NodeH delivers GET_TRANSACTION(8) from NodeG ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeG-31669 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      

      Possible solutions are:

      • send the REBALANCE_START/CH_UPDATE async
      • throw an exception when a GET_TRANSACTION is received and the node is shutting down.

        1. trace.log
          474 kB
        2. thread-dump.txt
          128 kB

              dberinde@redhat.com Dan Berindei (Inactive)
              pruivo@redhat.com Pedro Ruivo
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: