-
Bug
-
Resolution: Done
-
Major
-
9.1.0.Final, 9.0.3.Final
https://github.com/infinispan/infinispan/pull/5143 fixes the random test failures in PessimisticTxPartitionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartition, but it uncovers another random failure in OptimisticTxPartitionAndMergeDuringCommitTest.testDegradedPartitionWithDiscard.
When partition handling is enabled, TransactionTable.cleanupLeaverTransactions() will not roll back transactions from leavers, instead it will keep them in limbo until it sees a stable cache topology (i.e. either until the cache's stable topology is updated, or until all the stable topology's members are re-added to the current topology). TxInterceptor.verifyRemoteTransaction() instead always rolls back the transaction if the originator is not in the cluster view, and when the originator tries to complete the transaction after the merge it gets an exception:
10:27:34,880 WARN (remote-thread-Test-NodeG-p46360-t6:[]) [NonTotalOrderTxPerCacheInboundInvocationHandler] ISPN000071: Caught exception when handling command VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=16, updatedVersions={MagicKey#k1{168F/00552148/106@Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35@Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} org.infinispan.commons.CacheException: ISPN000361: Cannot commit remote transaction GlobalTx:Test-NodeE-10968:31983 as it was already rolled back at org.infinispan.commands.tx.CommitCommand.invalidRemoteTxReturnValue(CommitCommand.java:49) ~[classes/:?] at org.infinispan.commands.tx.AbstractTransactionBoundaryCommand.invokeAsync(AbstractTransactionBoundaryCommand.java:98) ~[classes/:?]
The test splits actually tries to ensure that the CommitCommand is never executed on the owner before the split, only after the merge. But the DiscardFilter that it uses only blocks one invocation, and it lets the commit proceed when the originator retries:
10:27:34,394 DEBUG (jgroups-6,Test-NodeG-8587:[]) [BaseTxPartitionAndMergeTest] Ignoring command VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=13, updatedVersions={MagicKey#k1{168F/00552148/106@Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35@Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} 10:27:34,416 DEBUG (transport-thread-Test-NodeE-p46282-t3:[Topology-opt-cache]) [LocalTopologyManagerImpl] Updating local topology for cache opt-cache: CacheTopology{id=14, rebalanceId=4, currentCH=DefaultConsistentHash{ns=256, owners = (4)[Test-NodeE-10968: 67+67, Test-NodeF-27031: 59+65, Test-NodeG-8587: 63+57, Test-NodeH-3978: 67+67]}, pendingCH=null, unionCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE-10968, Test-NodeF-27031], persistentUUIDs=[72351dc9-f621-41df-896b-1dc2f26798f5, 61811c2f-4931-49e2-b395-4debd39f6ca1]} 10:27:34,442 TRACE (jgroups-6,Test-NodeE-10968:[]) [RpcManagerImpl] Response(s) to VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=13, updatedVersions={MagicKey#k1{168F/00552148/106@Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35@Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} is {Test-NodeG-8587=CacheNotFoundResponse, Test-NodeF-27031=SuccessfulResponse(null)} 10:27:34,442 TRACE (jgroups-6,Test-NodeE-10968:[]) [TxDistributionInterceptor] We have a newer topology, ignoring responses and retrying 10:27:34,451 TRACE (jgroups-6,Test-NodeE-10968:[]) [RpcManagerImpl] Test-NodeE-10968 invoking VersionedCommitCommand{gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache', topologyId=14, updatedVersions={MagicKey#k1{168F/00552148/106@Test-NodeF-27031}=SimpleClusteredVersion{topologyId=13, version=2}, MagicKey#k2{1690/CCE79580/35@Test-NodeG-8587}=SimpleClusteredVersion{topologyId=13, version=2}}} to recipient list [Test-NodeF-27031, Test-NodeG-8587] with options RpcOptions{timeout=15000, unit=MILLISECONDS, deliverOrder=NONE, responseFilter=null, responseMode=SYNCHRONOUS_IGNORE_LEAVERS} 10:27:34,637 TRACE (remote-thread-Test-NodeG-p46360-t6:[]) [TxInterceptor] Replaying the transactions received as a result of state transfer VersionedPrepareCommand {modifications=[PutKeyValueCommand{key=MagicKey#k1{168F/00552148/106@Test-NodeF-27031}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}, PutKeyValueCommand{key=MagicKey#k2{1690/CCE79580/35@Test-NodeG-8587}, value=final-value, flags=[], commandInvocationId=CommandInvocation:local:0, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true, topologyId=13}], onePhaseCommit=false, retried=false, versionsSeen=null, gtx=GlobalTx:Test-NodeE-10968:31983, cacheName='opt-cache'} 10:27:34,661 TRACE (remote-thread-Test-NodeG-p46360-t6:[]) [TxInterceptor] Rolling back remote transaction GlobalTx:Test-NodeE-10968:31983 because either already completed (false) or originator no longer in the cluster (true).
- relates to
-
ISPN-6997 PessimisticTxPartitionAndMergeDuringRuntimeTest.testOriginatorIsolatedPartition random failures
- Closed