-
Bug
-
Resolution: Done
-
Major
-
9.4.7.Final, 10.0.0.Final
-
None
-
DataGrid Sprint #30
When an exception happens in ScatteredStateConsumerImpl.beforeTopologyInstalled, the exclusive topology lock is not released in StateConsumerImpl.onTopologyUpdate:
15:21:54,783 ERROR (transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t5:[Topology-scattered]) [LocalTopologyManagerImpl] ISPN000230: Failed to start rebalance for cache scattered java.lang.IllegalArgumentException: The task is already cancelled. at org.infinispan.statetransfer.InboundTransferTask.cancelSegments(InboundTransferTask.java:172) ~[classes/:?] at org.infinispan.statetransfer.StateConsumerImpl.cancelTransfers(StateConsumerImpl.java:959) ~[classes/:?] at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.beforeTopologyInstalled(ScatteredStateConsumerImpl.java:115) ~[classes/:?] at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:292) ~[classes/:?] at org.infinispan.scattered.impl.ScatteredStateConsumerImpl.onTopologyUpdate(ScatteredStateConsumerImpl.java:102) ~[classes/:?] at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:200) ~[classes/:?]
Because the exclusive topology lock is not released, threads that try to apply a new topology update block forever. This causes random failures with the ISPN-9863 thread leak checker:
15:26:25,922 WARN (testng-RehashClusterPublisherManagerTest:[]) [ThreadLeakChecker] Possible leaked thread: "transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t3" daemon prio=5 tid=0x236fd nid=NA waiting java.lang.Thread.State: WAITING java.base@11/jdk.internal.misc.Unsafe.park(Native Method) java.base@11/java.util.concurrent.locks.LockSupport.park(LockSupport.java:194) java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:885) java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:917) java.base@11/java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1240) java.base@11/java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:959) app//org.infinispan.statetransfer.StateTransferLockImpl.acquireExclusiveTopologyLock(StateTransferLockImpl.java:42) app//org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:291) app//org.infinispan.scattered.impl.ScatteredStateConsumerImpl.onTopologyUpdate(ScatteredStateConsumerImpl.java:102) app//org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:200) app//org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:57) app//org.infinispan.statetransfer.StateTransferManagerImpl$1.updateConsistentHash(StateTransferManagerImpl.java:113) app//org.infinispan.topology.LocalTopologyManagerImpl.doHandleTopologyUpdate(LocalTopologyManagerImpl.java:353) app//org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleTopologyUpdate$1(LocalTopologyManagerImpl.java:275) 15:26:25,923 ERROR (testng-RehashClusterPublisherManagerTest:[]) [TestSuiteProgress] Test configuration failed: org.infinispan.reactive.publisher.impl.RehashClusterPublisherManagerTest.testClassFinished java.lang.AssertionError: Leaked threads: {transport-thread-FunctionalScatteredInMemoryTest-NodeA-p43135-t3: possible sources [org.infinispan.functional.FunctionalScatteredInMemoryTest[bias=ON_WRITE], org.infinispan.statetransfer.ClusterTopologyManagerTest[SCATTERED_SYNC, tx=false], org.infinispan.functional.FunctionalCachestoreTest[passivation=true], org.infinispan.functional.distribution.rehash.FunctionalNonTxBackupOwnerBecomingPrimaryOwnerTest, org.infinispan.functional.distribution.rehash.FunctionalNonTxJoinerBecomingBackupOwnerTest, org.infinispan.api.mvcc.PutForExternalReadTest[REPL_SYNC, tx=false], org.infinispan.functional.distribution.rehash.FunctionalTxTest, org.infinispan.functional.FunctionalEncodingTypeTest[tx=true]]} at org.infinispan.commons.test.ThreadLeakChecker.performCheck(ThreadLeakChecker.java:148) ~[infinispan-commons-test-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT] at org.infinispan.commons.test.ThreadLeakChecker.testFinished(ThreadLeakChecker.java:109) ~[infinispan-commons-test-10.0.0-SNAPSHOT.jar:10.0.0-SNAPSHOT] at org.infinispan.test.fwk.TestResourceTracker.testFinished(TestResourceTracker.java:112) ~[test-classes/:?] at org.infinispan.test.AbstractInfinispanTest.testClassFinished(AbstractInfinispanTest.java:142) ~[test-classes/:?]
The fix should address both the exclusive topology lock itself, by releasing it in a finally block, and the IllegalArgumentException, either by ignoring already cancelled transfers or by only cancelling transfers while holding transferMapsLock.