Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-5460

Prepare commands sent before the target became a member should be rejected

    XMLWordPrintable

    Details

      Description

      Since ISPN-4198 was fixed, joiners ignore commands that were sent in a topology in which they were not members. But the joiner will still send a null response, which is valid, so the tx originator will not retry the command.

      The owner may have already sent the list of transactions to the joiner before the prepare, so this sequence of events can happen:

      1. A new topology is installed, which includes the joiner (B) in the write consistent hash. B requests the transactions from A, but doesn't receive anything because the affected keys of GlobalTransaction:<NodeA-40680>:99974:local haven't been updated yet.

      19:14:11,027 TRACE (remote-thread-NodeB-p35288-t2:) [StateConsumerImpl] Received new topology for cache resultCache2, isRebalance = false, isMember = false, topology = CacheTopology{id=0, rebalanceId=0, currentCH=ReplicatedConsistentHash{ns = 1, owners = (1)[NodeA-40680: 1]}, pendingCH=null, unionCH=null, actualMembers=[NodeA-40680]}
      19:14:11,039 TRACE (transport-thread-NodeA-p35281-t6:) [StateTransferInterceptor] handleTxCommand for command PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=-1}, origin null
      19:14:11,044 TRACE (transport-thread-NodeA-p35281-t1:) [StateConsumerImpl] Received new topology for cache resultCache2, isRebalance = true, isMember = true, topology = CacheTopology{id=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 1, owners = (1)[NodeA-40680: 1]}, pendingCH=ReplicatedConsistentHash{ns = 1, owners = (2)[NodeA-40680: 1, NodeB-64486: 0]}, unionCH=ReplicatedConsistentHash{ns = 1, owners = (2)[NodeA-40680: 1, NodeB-64486: 0]}, actualMembers=[NodeA-40680, NodeB-64486]}
      19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [StateProviderImpl] Received request for transactions from node NodeB-64486 for segments [0] of cache resultCache2 with topology id 1
      19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [StateProviderImpl] Skipping transaction NodeB-64486 because the state requestor doesn't own any key
      19:14:11,045 TRACE (remote-thread-NodeA-p35282-t4:) [CommandAwareRpcDispatcher] About to send back response SuccessfulResponse{responseValue=[]}  for command StateRequestCommand{cache=resultCache2, origin=NodeB-64486, type=GET_TRANSACTIONS, topologyId=1, segments=[0]}
      19:14:11,045 DEBUG (transport-thread-NodeB-p35287-t3:) [StateConsumerImpl] Applying 0 transactions for cache resultCache2 transferred from node NodeA-40680
      

      2. Node A adds the affected key and sends the prepare command to B with topology 0. B ignores the command.

      19:14:11,039 TRACE (transport-thread-NodeA-p35281-t6:) [AbstractCacheTransaction] Registering locked key: rules
      19:14:11,047 TRACE (transport-thread-NodeA-p35281-t6:) [JGroupsTransport] dests=null, command=PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=0}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=15000
      19:14:11,048 TRACE (OOB-4,NodeB-64486:) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=0} [sender=NodeA-40680]
      19:14:11,048 TRACE (remote-thread-NodeB-p35288-t6:) [NonTotalOrderPerCacheInboundInvocationHandler] Ignoring command sent before the local node was a member (command topology id is 0)
      19:14:11,048 TRACE (remote-thread-NodeB-p35288-t6:) [CommandAwareRpcDispatcher] About to send back response null for command PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:remote, cacheName='resultCache2', topologyId=0}
      19:14:11,058 TRACE (transport-thread-NodeA-p35281-t6:) [RpcManagerImpl] Response(s) to PrepareCommand {modifications=[PutKeyValueCommand{key=rules, value=1, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=0} is {}
      

      3. A tries to commit the tx, but B throws an exception because it can't find the remote tx.

      19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [LocalTransaction] Adding remote locks on [NodeA-40680, NodeB-64486]. Remote locks are null
      19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [TransactionCoordinator] Committing transaction GlobalTransaction:<NodeA-40680>:99974:local
      19:14:11,066 TRACE (transport-thread-NodeA-p35281-t6:) [RpcManagerImpl] NodeA-40680 invoking CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=2} to recipient list null with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=null, responseMode=SYNCHRONOUS_IGNORE_LEAVERS, skipReplicationQueue=false}
      19:14:11,066 TRACE (OOB-5,NodeB-64486:) [GlobalInboundInvocationHandler] Attempting to execute CacheRpcCommand: CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:local, cacheName='resultCache2', topologyId=2} [sender=NodeA-40680]
      19:14:11,066 TRACE (remote-thread-NodeB-p35288-t6:) [NonTotalOrderPerCacheInboundInvocationHandler] Calling perform() on CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:remote, cacheName='resultCache2', topologyId=2}
      19:14:11,066 TRACE (remote-thread-NodeB-p35288-t6:) [AbstractTransactionBoundaryCommand] Did not find a RemoteTransaction for GlobalTransaction:<NodeA-40680>:99974:remote
      19:14:11,067 WARN  (remote-thread-NodeB-p35288-t6:) [NonTotalOrderPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command CommitCommand {gtx=GlobalTransaction:<NodeA-40680>:99974:remote, cacheName='resultCache2', topologyId=2}
      java.lang.IllegalStateException: Remote transaction not found: GlobalTransaction:<NodeA-40680>:99974:remote
      	at org.infinispan.commands.tx.CommitCommand.invalidRemoteTxReturnValue(CommitCommand.java:54)
      	at org.infinispan.commands.tx.AbstractTransactionBoundaryCommand.perform(AbstractTransactionBoundaryCommand.java:89)
      	at org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokePerform(BasePerCacheInboundInvocationHandler.java:84)
      	at org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:31)
      

      This is causing random failures in the map/reduce tests (e.g. DistributedSharedCacheTwoNodesMapReduceTest), because map/reduce doesn't wait for all the nodes to join before inserting in the cache.

        Attachments

          Activity

            People

            Assignee:
            dan.berindei Dan Berindei
            Reporter:
            dan.berindei Dan Berindei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: