Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-5955

FAIL_SILENTLY doesn't always prevent rollback with pessimistic transactions

XMLWordPrintable

      The FAIL_SILENTLY should "prevent a failing operation from affecting any ongoing JTA transactions", but sometimes this doesn't work properly.

      FlagsReplicationTest has a transaction using both FAIL_SILENTLY and SKIP_LOCKING in the same transaction:

      1. A lock(FAIL_SILENTLY)(key) operation fails.
        Both on the primary owner and on the originator, PessimisticLockingInterceptor sends a TxCompletionNotificationCommand to all the nodes affected by the tx so far, to release the locks. This marks the transaction as completed, but doesn't mark it as rolled back.
      2. A remove(SKIP_LOCKING)(key) operation should then succeed.
        But the operation will invoke a ClusteredGetCommand(key, acquireRemoteLock=true, SKIP_LOCKING) on both owners, which will in turn invoke locally a LockControlCommand(key, SKIP_LOCKING).
        At this point, TxInterceptor sees that the transaction is already completed, and invokes a RollbackCommand to mark it as rolled back, then remove it from the transaction table. The command still succeeds.
      3. The test then tries to commit the transaction. Usually, the PrepareCommand doesn't find the remote transaction in the transaction table, and it succeeds.
        But some of the time, because the ClusteredGetCommand command only uses the first response, the remote transaction is not removed from the transaction table by the time the PrepareCommand is executed on one of the owners.
        If that happens, the PrepareCommand executes with a remote transaction instance that's already marked for rollback, and fails when trying to put the key in the context.
      15:42:56,736 ERROR (remote-thread-NodeF-p1112-t5:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] ISPN000136: Error executing command LockControlCommand, writing keys []
      org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key replication.FlagsReplicationTest and requestor GlobalTx:NodeG-6318:216. Lock is held by GlobalTx:NodeG-6318:215
      
      15:42:56,802 TRACE (OOB-1,NodeF-64741:) [NonTotalOrderTxPerCacheInboundInvocationHandler] Calling perform() on ClusteredGetCommand{key=replication.FlagsReplicationTest, flags=[SKIP_LOCKING]}
      15:42:56,803 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command LockControlCommand{cache=dist, keys=[replication.FlagsReplicationTest], flags=[SKIP_LOCKING], unlock=false, gtx=GlobalTx:NodeG-6318:216} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
      
      15:42:56,822 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [InvocationContextInterceptor] Invoked with command PrepareCommand {modifications=[RemoveCommand{key=replication.FlagsReplicationTest, value=null, flags=[SKIP_LOCKING], valueMatcher=MATCH_ALWAYS}], onePhaseCommit=true, gtx=GlobalTx:NodeG-6318:216, cacheName='dist', topologyId=6} and InvocationContext [org.infinispan.context.impl.RemoteTxInvocationContext@75bf6da7]
      
      15:42:56,832 TRACE (OOB-1,NodeF-64741:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
      
      15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [EntryFactoryImpl] Creating new entry for key replication.FlagsReplicationTest
      15:42:56,864 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [BaseSequentialInvocationContext] Interceptor class org.infinispan.interceptors.EntryWrappingInterceptor threw exception org.infinispan.transaction.xa.InvalidTransactionException: This remote transaction GlobalTx:NodeG-6318:216 is already rolled back
      
      15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] invokeNextInterceptorAndVerifyTransaction :: originatorMissing=false, alreadyCompleted=true
      15:42:56,873 TRACE (remote-thread-NodeF-p1112-t6:GlobalTx:NodeG-6318:216) [TxInterceptor] Rolling back remote transaction GlobalTx:NodeG-6318:216 because either already completed (true) or originator no longer in the cluster (false).
      

      Possible actions:

      • Never try to release locks from a LockControlCommand, wait for the RollbackCommand instead. This could cause problems if the primary owner dies, the transaction tries to lock the same entry again, and the backup owner that became primary wrongfully assumes that it is a proper owner.
      • Use a LockControlCommand(unlock=true) instead of a TxCompletionNotificationCommand to release the backup locks after an operation with FAIL_SILENTLY failed.
      • Don't set acquireRemoteLock=true when the SKIP_LOCKING flag has been set.

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: