Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-5168

Recovery: force commit on an orphan tx unlocks remote keys too soon

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 7.0.3.Final, 7.1.0.Beta1
    • Core

      The force commit admin operation replays the PrepareCommand on all the owners to acquire any missing locks. But the prepare doesn't do anything if the tx already exists and is marked as prepared on the remote nodes.

      However, when executing the CommitCommand, TxInterceptor realizes that the existing remote tx has an older topology id and replays the PrepareCommand. And if the originator of the tx left the cluster, TxInterceptor.invokeNextInterceptorAndVerifyTransaction() will roll back the tx and unlock all the keys. It doesn't throw an exception, so the commit still succeeds, but without holding any locks.

      10:38:51,313 TRACE (testng-OriginatorAndOwnerFailureReplicationTest:) [JGroupsTransport] dests=[OriginatorAndOwnerFailureReplicationTest-NodeD-50040, OriginatorAndOwnerFailureReplicationTest-NodeE-44976], command=PrepareCommand {modifications=[PutKeyValueCommand{key=aKey, value=newValue, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=RecoveryAwareGlobalTransaction{xid=< 1, 64, 64, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000 >, internalId=562962838323201} GlobalTransaction:<OriginatorAndOwnerFailureReplicationTest-NodeD-50040>:2:local, cacheName='___defaultcache', topologyId=5}, mode=SYNCHRONOUS, timeout=15000
      10:38:51,319 TRACE (testng-OriginatorAndOwnerFailureReplicationTest:) [JGroupsTransport] dests=null, command=CommitCommand {gtx=RecoveryAwareGlobalTransaction{xid=< 1, 64, 64, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000 >, internalId=562962838323201} GlobalTransaction:<OriginatorAndOwnerFailureReplicationTest-NodeD-50040>:2:local, cacheName='___defaultcache', topologyId=5}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=15000
      10:38:51,322 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [TxInterceptor] Remote tx topology id 4 and command topology is 5
      10:38:51,322 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [TxInterceptor] Replaying the transactions received as a result of state transfer PrepareCommand {modifications=[PutKeyValueCommand{key=aKey, value=newValue, flags=null, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedMetadata{version=null}, successful=true}], onePhaseCommit=false, gtx=RecoveryAwareGlobalTransaction{xid=< 1, 64, 64, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000 >, internalId=562962838323201} GlobalTransaction:<OriginatorAndOwnerFailureReplicationTest-NodeF-60014>:2:remote, cacheName='___defaultcache', topologyId=-1}
      10:38:51,323 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [TxInterceptor] invokeNextInterceptorAndVerifyTransaction :: originatorMissing=true, alreadyCompleted=true
      10:38:51,323 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [TxInterceptor] Rolling back remote transaction RecoveryAwareGlobalTransaction{xid=< 1, 64, 64, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000, -12-63-13-63-32-39-44-29-891-73-111-107-75-113-88-108-59-88120000000000000000000000000000000000000000000 >, internalId=562962838323201} GlobalTransaction:<OriginatorAndOwnerFailureReplicationTest-NodeF-60014>:2:remote because either already completed (true) or originator no longer in the cluster (true).
      10:38:51,323 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [OwnableReentrantPerEntryLockContainer] Unlocking lock instance for key aKey
      10:38:51,328 TRACE (remote-thread-1,OriginatorAndOwnerFailureReplicationTest-NodeE:) [ReadCommittedEntry] Updating entry (key=aKey removed=false valid=true changed=true created=true loaded=false value=newValue metadata=EmbeddedMetadata{version=null}, providedMetadata=null)
      

              Unassigned Unassigned
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: