Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-11609

StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut[SCATTERED_SYNC] random failures

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 11.0.0.Final
    • 11.0.0.Dev03
    • Core, Test Suite
    • None

      java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_StateTransferOverwritingValueTest-NodeB
      	at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50)
      	at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40)
      	at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201)
      	at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut(StateTransferOverwritingValueTest.java:96)
      

            [ISPN-11609] StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringPut[SCATTERED_SYNC] random failures

            The test blocks the execution of the PutKeyValueCommand or RemoveCommand replicating the value to the joiner node, but sometimes this command reaches the joiner before the it finished requesting segments and signalling "transaction data received". That means when BlockingInterceptor waits on the CyclicBarrier, it prevents state transfer from finishing and sending the RebalancePhaseConfirmCommand.

            00:08:24,392 TRACE (jgroups-4,Test-NodeB:[]) [JGroupsTransport] Test-NodeB received request 3 from Test-NodeA: SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
            00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Topology update processed, stateTransferTopologyId = 2, startRebalance = true, pending CH = ScatteredConsistentHash{ns=256, rebalanced=true, owners = (2)[Test-NodeA: 128, Test-NodeB: 128]}
            00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Unlock State Transfer in Progress for topology ID 2
            00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateTransferLockImpl] Signalling transaction data received for topology 2
            

            The PutKeyValueCommand invocation resumes on the topology update thread and blocks until the test fails:

            00:08:24,396 TRACE (jgroups-5,Test-NodeB:[]) [TrianglePerCacheInboundInvocationHandler] Calling perform() on SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
            00:08:24,397 TRACE (jgroups-5,Test-NodeB:[]) [BlockingInterceptor] Command blocking after completion of PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}
            00:08:34,418 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace[SCATTERED_SYNC, tx=false]
            java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_Test-NodeB
            	at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50) ~[test-classes/:?]
            	at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40) ~[test-classes/:?]
            	at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201) ~[test-classes/:?]
            	at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace(StateTransferOverwritingValueTest.java:108) ~[test-classes/:?]
            

            After the PutKeyValueCommand invocation finishes, ScatteredStateConsumerImpl can confirm the state transfer:

            00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [CLUSTER] Shutdown while handling command SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}}
            00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Finished receiving of segments for cache defaultcache for topology 2.
            00:08:34,439 TRACE (jgroups-5,Test-NodeB:[]) [JGroupsTransport] Test-NodeB sending command to Test-NodeA: ConfirmRebalancePhaseCommand{cacheName='defaultcache', origin=Test-NodeB, throwable=null, topologyId=2, viewId=1}
            

            Dan Berindei (Inactive) added a comment - The test blocks the execution of the PutKeyValueCommand or RemoveCommand replicating the value to the joiner node, but sometimes this command reaches the joiner before the it finished requesting segments and signalling "transaction data received". That means when BlockingInterceptor waits on the CyclicBarrier , it prevents state transfer from finishing and sending the RebalancePhaseConfirmCommand . 00:08:24,392 TRACE (jgroups-4,Test-NodeB:[]) [JGroupsTransport] Test-NodeB received request 3 from Test-NodeA: SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}} 00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Topology update processed, stateTransferTopologyId = 2, startRebalance = true, pending CH = ScatteredConsistentHash{ns=256, rebalanced=true, owners = (2)[Test-NodeA: 128, Test-NodeB: 128]} 00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Unlock State Transfer in Progress for topology ID 2 00:08:24,395 TRACE (jgroups-5,Test-NodeB:[]) [StateTransferLockImpl] Signalling transaction data received for topology 2 The PutKeyValueCommand invocation resumes on the topology update thread and blocks until the test fails: 00:08:24,396 TRACE (jgroups-5,Test-NodeB:[]) [TrianglePerCacheInboundInvocationHandler] Calling perform() on SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}} 00:08:24,397 TRACE (jgroups-5,Test-NodeB:[]) [BlockingInterceptor] Command blocking after completion of PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2} 00:08:34,418 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace[SCATTERED_SYNC, tx=false] java.util.concurrent.TimeoutException: Timed out waiting for event pre_rebalance_confirmation_2_from_Test-NodeB at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:50) ~[test-classes/:?] at org.infinispan.test.fwk.CheckPoint.awaitStrict(CheckPoint.java:40) ~[test-classes/:?] at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.doTest(StateTransferOverwritingValueTest.java:201) ~[test-classes/:?] at org.infinispan.distribution.rehash.StateTransferOverwritingValueTest.testBackupOwnerJoiningDuringReplace(StateTransferOverwritingValueTest.java:108) ~[test-classes/:?] After the PutKeyValueCommand invocation finishes, ScatteredStateConsumerImpl can confirm the state transfer: 00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [CLUSTER] Shutdown while handling command SingleRpcCommand{cacheName='defaultcache', command=PutKeyValueCommand{key=key, value=v1, flags=[SKIP_OWNERSHIP_CHECK], commandInvocationId=CommandInvocation:Test-NodeA:2226, putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{version=SimpleClusteredVersion{topologyId=2, version=2}, lifespan=-1, maxIdle=-1}, successful=true, topologyId=2}} 00:08:34,438 DEBUG (jgroups-5,Test-NodeB:[]) [StateConsumerImpl] Finished receiving of segments for cache defaultcache for topology 2. 00:08:34,439 TRACE (jgroups-5,Test-NodeB:[]) [JGroupsTransport] Test-NodeB sending command to Test-NodeA: ConfirmRebalancePhaseCommand{cacheName='defaultcache', origin=Test-NodeB, throwable=null, topologyId=2, viewId=1}

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: