-
Bug
-
Resolution: Done
-
Major
-
5.2.7.Final, 6.0.2.Final, 7.0.0.CR2
-
None
This is a strange one:
A pessimistic transaction started on the primary owner during state transfer can fail because a backup owner issues a ClusteredGetCommand to the primary while processing the write command. This can happen if the backup needs to perform a remote get (due to a DELTA_WRITE, conditional, or reliable return values). In this case, it's a DELTA_WRITE.
In this chain of events, state transfer is ongoing, and the union CH is (east-dht5, west-dht5, east-dht6) In this case, the pessimistic transaction originated on east-dht5 must be invoked on west-dht5 and east-dht6:
2014-10-21 02:15:01,309 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created a new local transaction: LocalXaTransaction{xid=null} LocalTransaction{remoteLockedNodes=null, isMarkedForRollback=false, lockedKeys=null, backupKeyLocks=null, isFromStateTransfer=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0 2014-10-21 02:15:01,309 TRACE [org.infinispan.util.concurrent.locks.containers.OwnableReentrantPerEntryLockContainer] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Creating and acquiring new lock instance for key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831] 2014-10-21 02:15:01,345 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Replication task sending PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} to addresses [240-west-dht5.comcast.net-35898(CH2-Chicago-IL), 240-east-dht6.comcast.net-47049(CMC-Denver-CO)] with response mode GET_ALL
Let's examine east-dht6. It gets the write request. It needs to perform a remote get to pull the full key context, since it receives only the delta.
2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Attempting to execute command: PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResourceDelta [qamId=431831, basePort=2049, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1, baseProgramNumber=1, portStep=1, program=Program [portNumber=2053, programNumber=5, status=UNUSED, sessionToken=, lastUpdatedTime=1413872101310, bandwidth=0, inputId=-1]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} [sender=240-east-dht5.comcast.net-26106(CMC-Denver-CO)] ... 2014-10-21 02:15:01,362 TRACE [org.infinispan.transaction.TransactionTable] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=... globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:remote, topologyId=33, age(ms)=0 2014-10-21 02:15:01,362 TRACE [org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher] (OOB-2149,session-resource-cluster,240-east-dht6.comcast.net-47049(CMC-Denver-CO)) Replication task sending ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]} to addresses [240-east-dht5.comcast.net-26106(CMC-Denver-CO), 240-west-dht5.comcast.net-35898(CH2-Chicago-IL)] with response mode GET_FIRST
east-dht5 receives the ClusteredGetCommand request. Because TxDistributionInterceptor::line 325 is true, the acquireRemoteLock flag is set to true.
boolean acquireRemoteLock = false; if (ctx.isInTxScope()) { TxInvocationContext txContext = (TxInvocationContext) ctx; acquireRemoteLock = isWrite && isPessimisticCache && !txContext.getAffectedKeys().contains(key); }
Thus, when east-dht5 runs the ClusteredGetCommand. it will create and invoke a LockControlCommand. Because the LCC is created inline, it will not have its origin set. But the LCC will create a RemoteTxInvocationContext. When the LCC runs through the interceptor chain, TxInterceptor::invokeNextInterceptorAndVerifyTransaction will check the originator (because of the remote context). When it doesn't find it, it will force a rollback.
2014-10-21 02:15:01,347 TRACE [org.infinispan.remoting.InboundInvocationHandlerImpl] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Calling perform() on ClusteredGetCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE]} 2014-10-21 02:15:01,347 TRACE [org.infinispan.transaction.TransactionTable] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Created and registered remote transaction RemoteTransaction{modifications=[], lookedUpEntries={}, lockedKeys=null, backupKeyLocks=null, missingLookedUpEntries=false, isMarkedForRollback=false} globalTx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, topologyId=33, age(ms)=0 2014-10-21 02:15:01,347 TRACE [org.infinispan.statetransfer.StateTransferInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) handleTopologyAffectedCommand for command LockControlCommand{cache=qamAllocation, keys=[EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831]], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], unlock=false} 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.locking.PessimisticLockingInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Locking key EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], no need to check for pending locks. 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) invokeNextInterceptorAndVerifyTransaction :: originatorMissing=true, alreadyCompleted=false 2014-10-21 02:15:01,347 TRACE [org.infinispan.interceptors.TxInterceptor] (OOB-1721,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) Rolling back remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local because either already completed(false) or originator(null) no longer in the cluster(true). // ROLLBACK HAPPENS HERE
When the transaction attempts to commit, it fails due to the spurious rollback. For instance:
2014-10-21 02:15:01,384 WARN [org.infinispan.remoting.rpc.RpcManagerImpl] (OOB-1713,session-resource-cluster,240-east-dht5.comcast.net-26106(CMC-Denver-CO)) ISPN000071: Caught exception when handling command PrepareCommand {modifications=[PutKeyValueCommand{key=EdgeResourceCacheKey[edgeDeviceId=4211,resourceId=431831], value=QamResource [qamId=431831, currentBandwidth=3750000, maxBandwidth=38810000, maxPrograms=255, currentPrograms=1], flags=[SKIP_REMOTE_LOOKUP, DELTA_WRITE], putIfAbsent=false, lifespanMillis=-1, maxIdleTimeMillis=-1, successful=true, ignorePreviousValue=false}], onePhaseCommit=true, gtx=GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local, cacheName='qamAllocation', topologyId=33} org.infinispan.remoting.RemoteException: ISPN000217: Received exception from 240-west-dht5.comcast.net-35898(CH2-Chicago-IL), see cause for remote stack trace ... at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:124) at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:343) at org.infinispan.transaction.TransactionCoordinator.commit(TransactionCoordinator.java:175) at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:122) ... com.arjuna.ats.internal.jta.resources.arjunacore.XAResourceRecord.topLevelOnePhaseCommit(XAResourceRecord.java:636) at com.arjuna.ats.arjuna.coordinator.BasicAction.onePhaseCommit(BasicAction.java:2619) at com.arjuna.ats.arjuna.coordinator.BasicAction.End(BasicAction.java:1779) at com.arjuna.ats.arjuna.coordinator.TwoPhaseCoordinator.end(TwoPhaseCoordinator.java:88) at com.arjuna.ats.arjuna.AtomicAction.commit(AtomicAction.java:177) Caused by: org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTransaction:<240-east-dht5.comcast.net-26106(CMC-Denver-CO)>:1575124:local is already rolled back at org.infinispan.transaction.RemoteTransaction.checkIfRolledBack(RemoteTransaction.java:137) at org.infinispan.transaction.RemoteTransaction.putLookedUpEntry(RemoteTransaction.java:73) at org.infinispan.context.impl.RemoteTxInvocationContext.putLookedUpEntry(RemoteTxInvocationContext.java:99) at org.infinispan.container.EntryFactoryImpl.wrapInternalCacheEntryForPut(EntryFactoryImpl.java:242) at org.infinispan.container.EntryFactoryImpl.wrapEntryForPut(EntryFactoryImpl.java:166) ...
So it looks like there's a couple ways to handle this. One would be to only acquire a remote lock in TxDistributionInterceptor if the current node was the originator. Another – possibly less intrusive – would be to set the origin of the invoked LCC command to that of the ClusteredGetCommand. The former should be preferable, but this code tends to be a bit labyrinthine with the various pessimistic use cases out there.