Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2966

NBST: Concurrent leavers can lead to deadlock

    XMLWordPrintable

Details

    Description

      This sequence of events, leads to a thread deadlock in the coordinator

      1) NodeF sends LEAVE message. new topologyId=8
      2) NodeE delivers REBALANCE_START(8)
      3) NodeF and NodeG delivers REBALANCE_START(8)
      4) NodeH delivers GET_TRANSACTION(8) from NodeE ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeE-28744 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      5) NodeH sends LEAVE message. new topologyId=9
      6) NodeH delivers REBALANCE_START(8) ==> Ignoring rebalance 8 for cache dist that doesn't exist locally
      7) NodeH delivers GET_TRANSACTION(8) from NodeG ==> Transactions were requested by node ConcurrentNonOverlappingLeaveTest-NodeG-31669 with topology 8, greater than the local topology (7). Waiting for topology 8 to be installed locally.
      

      Possible solutions are:

      • send the REBALANCE_START/CH_UPDATE async
      • throw an exception when a GET_TRANSACTION is received and the node is shutting down.

      Attachments

        1. thread-dump.txt
          128 kB
        2. trace.log
          474 kB

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              pruivo@redhat.com Pedro Ruivo
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: