-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
12.0.0.Final
-
None
-
Undefined
Sometimes the backup IracPutKeyValueCommand is delayed on NodeD and it stores the value only after the test advanced the time service and invokes the get operation that's supposed to expire the entry, causing the entry expiration to fail in the NYC site.
First IracPutKeyValueCommand arrives at NodeD and locks the key:
15:59:07,907 TRACE (jgroups-7,Test-NodeB:[]) [InvocationContextInterceptor] Invoked with command PrepareCommand {modifications=[PutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, flags=[], commandInvocationId=CommandInvocation:local:0, ... 15:59:07,910 TRACE (jgroups-5,bridge-org.infinispan.xsite.Test,_Test-NodeC:NYC-2:[]) [InvocationContextInterceptor] Invoked with command IracPutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, metadata=EmbeddedLifespanExpirableMetadata{lifespan=1000, version=null}, privateMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1', version=(LON-1=(6:1))}, entryVersion=null}, successful=true, commandInvocationId=CommandInvocation:Test-NodeC:45679} and InvocationContext [SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null, lockOwner=CommandInvocation:Test-NodeC:45679}] 15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [InvocationContextInterceptor] Invoked with command IracPutKeyValueCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, metadata=EmbeddedLifespanExpirableMetadata{lifespan=1000, version=null}, privateMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1', version=(LON-1=(6:1))}, entryVersion=null}, successful=true, commandInvocationId=CommandInvocation:Test-NodeC:45679} and InvocationContext [SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=Test-NodeC, lockOwner=CommandInvocation:Test-NodeC:45679}] 15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [DefaultLockManager] Lock key=MagicKey{7FD/E1260FE1/147@Test-NodeB} for owner=CommandInvocation:Test-NodeC:45679. timeout=10000 (MILLISECONDS) 15:59:07,912 TRACE (jgroups-8,Test-NodeD:[]) [InfinispanLock] LockPlaceHolder{lockState=ACQUIRED, owner=CommandInvocation:Test-NodeC:45679} successfully acquired the lock.
Then RemoteExpiredCommand arrives at NodeD, tries to lock the key, and fails:
15:59:07,917 TRACE (testng-Test:[]) [InvocationContextInterceptor] Invoked with command RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, lifespan=1000, maxIde=false, internalMetadata=null} and InvocationContext [SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=null, lockOwner=CommandInvocation:local:0}] 15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [InvocationContextInterceptor] Invoked with command RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, lifespan=1000, maxIde=false, internalMetadata=null} and InvocationContext [SingleKeyNonTxInvocationContext{isLocked=false, key=null, cacheEntry=null, origin=Test-NodeC, lockOwner=CommandInvocation:local:0}] 15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [DefaultLockManager] Lock key=MagicKey{7FD/E1260FE1/147@Test-NodeB} for owner=CommandInvocation:local:0. timeout=0 (MILLISECONDS) 15:59:07,918 TRACE (jgroups-7,Test-NodeD:[]) [InfinispanLock] Unable to acquire. Lock is held.
Next IracPutKeyValueCommand stores the value and releases the lock:
15:59:07,919 TRACE (jgroups-8,Test-NodeD:[]) [AbstractInternalDataContainer] Store MagicKey{7FD/E1260FE1/147@Test-NodeB}=MortalCacheEntry{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, internalMetadata=PrivateMetadata{iracMetadata=IracMetadata{site='LON-1', version=(LON-1=(6:1))}, entryVersion=null}, created=1002000, lifespan=1000} in container 15:59:07,919 TRACE (jgroups-8,Test-NodeD:[]) [DefaultLockManager] Release locks for keys=[MagicKey{7FD/E1260FE1/147@Test-NodeB}]. owner=CommandInvocation:Test-NodeC:45679
But it's too late for the RemoveExpiredCommand, which gives up:
15:59:07,920 ERROR (jgroups-7,Test-NodeD:[]) [InvocationContextInterceptor] ISPN000136: Error executing command RemoveExpiredCommand on Cache 'defaultcache', writing keys [MagicKey{7FD/E1260FE1/147@Test-NodeB}] org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key MagicKey{7FD/E1260FE1/147@Test-NodeB} and requestor CommandInvocation:local:0. Lock is held by null at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:291) ~[classes/:?] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:221) ~[classes/:?] at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:439) ~[classes/:?] at org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:411) ~[classes/:?] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:251) ~[classes/:?] at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:275) ~[classes/:?] at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAndRecord(AbstractLockingInterceptor.java:287) ~[classes/:?] at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitNonTxDataWriteCommand(AbstractLockingInterceptor.java:137) ~[classes/:?] at org.infinispan.interceptors.locking.AbstractLockingInterceptor.visitRemoveExpiredCommand(AbstractLockingInterceptor.java:107) ~[classes/:?] at org.infinispan.commands.write.RemoveExpiredCommand.acceptVisitor(RemoveExpiredCommand.java:46) ~[classes/:?] at org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:59) ~[classes/:?] at org.infinispan.interceptors.impl.TxInterceptor.visitRemoveExpiredCommand(TxInterceptor.java:238) ~[classes/:?] 15:59:07,926 TRACE (jgroups-7,Test-NodeD:[]) [CallInterceptor] Cannot remove entry due to it not being expired - this can be caused by different clocks on nodes or a concurrent write 15:59:07,926 TRACE (jgroups-7,Test-NodeD:[]) [BaseDistributionInterceptor] Skipping the replication of the conditional command as it did not succeed on primary owner (RemoveExpiredCommand{key=MagicKey{7FD/E1260FE1/147@Test-NodeB}, value=v, lifespan=1000, maxIde=false, internalMetadata=null}).
And because the entry was not expired, the test fails:
15:59:07,929 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.xsite.AsyncBackupExpirationTest.testExpiredAccess[LON=PESSIMISTIC_TX, NYC=OPTIMISTIC_TX_RC]([true, false]) java.lang.AssertionError: at org.testng.AssertJUnit.fail(AssertJUnit.java:59) ~[testng-6.14.3.jar:?] at org.testng.AssertJUnit.assertTrue(AssertJUnit.java:24) ~[testng-6.14.3.jar:?] at org.testng.AssertJUnit.assertNull(AssertJUnit.java:282) ~[testng-6.14.3.jar:?] at org.testng.AssertJUnit.assertNull(AssertJUnit.java:274) ~[testng-6.14.3.jar:?] at org.infinispan.xsite.AsyncBackupExpirationTest.testExpiredAccess(AsyncBackupExpirationTest.java:182) ~[test-classes/:?]
I believe this can happen in production code just as in the test, if the entry lifespan was shorter.
So we shouldn't just add a delay in the test, we should make the RemoveExpiredCommand wait for the IracPutKeyValueCommand to finish writing the value and release the lock.