java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_StateTransferOverwritingValueTest-NodeB
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50)
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40)
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201)
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut(StateTransferOverwritingValueTest.java:96)
[ISPN-11609] StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut[SCATTERED_SYNC] random failures
The test blocks the execution of the PutKeyValueCommand or RemoveCommand replicating the value to the joiner node, but sometimes this command reaches the joiner before the it finished requesting segments and signalling "transaction data received". That means when BlockingInterceptor waits on the CyclicBarrier, it prevents state transfer from finishing and sending the RebalancePhaseConfirmCommand.
00:08:24,392 TRACE (jgroups-4,Test-NodeB:[]) [JGroupsTransport] Test-NodeB received request 3 from Test-NodeA: SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Topology update processed, stateTransferTopologyId = 2, startRebalance = true, pending CH = ScatteredConsistentHash{ns=256, rebalanced=true, owners = (2)[Test-NodeA: 128, Test-NodeB: 128]}
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Unlock State Transfer in Progress for topology ID 2
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateTransferLockImpl] Signalling transaction data received for topology 2
The PutKeyValueCommand invocation resumes on the topology update thread and blocks until the test fails:
00:08:24,396 TRACE (jgroups-5,Test-NodeB:[]) [TrianglePerCacheInboundInvocationHandler] Calling perform() on SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:24,397 TRACE (jgroups-5,Test-NodeB:[]) [BlockingInterceptor] Command blocking after completion of PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}
00:08:34,418 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace[SCATTERED_SYNC, tx=false]
java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_Test-NodeB
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50) ~[test-classes/:?]
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40) ~[test-classes/:?]
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201) ~[test-classes/:?]
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace(StateTransferOverwritingValueTest.java:108) ~[test-classes/:?]
After the PutKeyValueCommand invocation finishes, ScatteredStateConsumerImpl can confirm the state transfer:
00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [CLUSTER] Shutdown while handling command SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Finished receiving of segments for cache defaultcache for topology 2.
00:08:34,439 TRACE (jgroups-5,Test-NodeB:[]) [JGroupsTransport] Test-NodeB sending command to Test-NodeA: ConfirmRebalancePhaseCommand{cacheName='defaultcache', origin=Test-NodeB, throwable=null, topologyId=2, viewId=1}
Dan Berindei (Inactive)
added a comment - The test blocks the execution of the PutKeyValueCommand or RemoveCommand replicating the value to the joiner node, but sometimes this command reaches the joiner before the it finished requesting segments and signalling "transaction data received". That means when BlockingInterceptor waits on the CyclicBarrier , it prevents state transfer from finishing and sending the RebalancePhaseConfirmCommand .
00:08:24,392 TRACE (jgroups-4,Test-NodeB:[]) [JGroupsTransport] Test-NodeB received request 3 from Test-NodeA: SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Topology update processed, stateTransferTopologyId = 2, startRebalance = true, pending CH = ScatteredConsistentHash{ns=256, rebalanced=true, owners = (2)[Test-NodeA: 128, Test-NodeB: 128]}
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Unlock State Transfer in Progress for topology ID 2
00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateTransferLockImpl] Signalling transaction data received for topology 2
The PutKeyValueCommand invocation resumes on the topology update thread and blocks until the test fails:
00:08:24,396 TRACE (jgroups-5,Test-NodeB:[]) [TrianglePerCacheInboundInvocationHandler] Calling perform() on SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:24,397 TRACE (jgroups-5,Test-NodeB:[]) [BlockingInterceptor] Command blocking after completion of PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}
00:08:34,418 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace[SCATTERED_SYNC, tx=false]
java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_Test-NodeB
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50) ~[test-classes/:?]
at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40) ~[test-classes/:?]
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201) ~[test-classes/:?]
at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace(StateTransferOverwritingValueTest.java:108) ~[test-classes/:?]
After the PutKeyValueCommand invocation finishes, ScatteredStateConsumerImpl can confirm the state transfer:
00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [CLUSTER] Shutdown while handling command SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Finished receiving of segments for cache defaultcache for topology 2.
00:08:34,439 TRACE (jgroups-5,Test-NodeB:[]) [JGroupsTransport] Test-NodeB sending command to Test-NodeA: ConfirmRebalancePhaseCommand{cacheName='defaultcache', origin=Test-NodeB, throwable=null, topologyId=2, viewId=1}
The test blocks the execution of the PutKeyValueCommand or RemoveCommand replicating the value to the joiner node, but sometimes this command reaches the joiner before the it finished requesting segments and signalling "transaction data received". That means when BlockingInterceptor waits on the CyclicBarrier, it prevents state transfer from finishing and sending the RebalancePhaseConfirmCommand.
The PutKeyValueCommand invocation resumes on the topology update thread and blocks until the test fails:
After the PutKeyValueCommand invocation finishes, ScatteredStateConsumerImpl can confirm the state transfer: