-
Bug
-
Resolution: Done
-
Major
-
None
-
Sprint 9.4.0.CR3
Not sure what changed recently, but the thread dumps show a state transfer executor thread blocked waiting for a clustered listeners response. The stack includes two instances of ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(), which suggests that at some point all the state transfer executor threads (6) and async transport threads (4) were busy, and the transport thread pool queue (10) was also full.
"stateTransferExecutor-thread-PessimisticTxPartitionAndMergeDuringRollbackTest-NodeC-p57758-t1" #192601 daemon prio=5 os_prio=0 tid=0x00007f7094031800 nid=0x5b27 waiting on condition [0x00007f70190ce000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x00000000d470b0f8> (a java.util.concurrent.CompletableFuture$Signaller) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1695) at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1775) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1915) at org.infinispan.util.concurrent.CompletableFutures.await(CompletableFutures.java:93) at org.infinispan.remoting.rpc.RpcManagerImpl.blocking(RpcManagerImpl.java:262) at org.infinispan.statetransfer.StateConsumerImpl.getClusterListeners(StateConsumerImpl.java:895) at org.infinispan.statetransfer.StateConsumerImpl.fetchClusterListeners(StateConsumerImpl.java:453) at org.infinispan.statetransfer.StateConsumerImpl.onTopologyUpdate(StateConsumerImpl.java:309) at org.infinispan.statetransfer.StateTransferManagerImpl.doTopologyUpdate(StateTransferManagerImpl.java:197) at org.infinispan.statetransfer.StateTransferManagerImpl.access$000(StateTransferManagerImpl.java:54) at org.infinispan.statetransfer.StateTransferManagerImpl$1.rebalance(StateTransferManagerImpl.java:117) at org.infinispan.topology.LocalTopologyManagerImpl.doHandleRebalance(LocalTopologyManagerImpl.java:517) - locked <0x00000000cc304f88> (a org.infinispan.topology.LocalCacheStatus) at org.infinispan.topology.LocalTopologyManagerImpl.lambda$handleRebalance$3(LocalTopologyManagerImpl.java:475) at org.infinispan.topology.LocalTopologyManagerImpl$$Lambda$429/1368424830.run(Unknown Source) at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at org.infinispan.executors.LazyInitializingExecutorService.execute(LazyInitializingExecutorService.java:121) at org.infinispan.executors.LimitedExecutor.tryExecute(LimitedExecutor.java:151) at org.infinispan.executors.LimitedExecutor.executeInternal(LimitedExecutor.java:118) at org.infinispan.executors.LimitedExecutor.execute(LimitedExecutor.java:108) at org.infinispan.topology.LocalTopologyManagerImpl.handleRebalance(LocalTopologyManagerImpl.java:473) at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:199) at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:160) at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44) at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$executeOnClusterAsync$5(ClusterTopologyManagerImpl.java:600) at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$304/909965247.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:2038) at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) at org.infinispan.executors.LazyInitializingExecutorService.submit(LazyInitializingExecutorService.java:91) at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterAsync(ClusterTopologyManagerImpl.java:596) at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastRebalanceStart(ClusterTopologyManagerImpl.java:437) at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:903) - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus) at org.infinispan.topology.ClusterCacheStatus.queueRebalance(ClusterCacheStatus.java:140) - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus) at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.updateMembersAndRebalance(PreferConsistencyStrategy.java:299) at org.infinispan.partitionhandling.impl.PreferConsistencyStrategy.onPartitionMerge(PreferConsistencyStrategy.java:245) at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:642) - locked <0x00000000cc305138> (a org.infinispan.topology.ClusterCacheStatus) at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:494) at org.infinispan.topology.ClusterTopologyManagerImpl$$Lambda$578/46555845.run(Unknown Source) at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37) at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
All partition and merge tests seem to be affected: PessimisticTxPartitionAndMergeDuringPrepareTest, PessimisticTxPartitionAndMergeDuringRollbackTest, PessimisticTxPartitionAndMergeDuringRuntimeTest, OptimisticTxPartitionAndMergeDuringCommitTest, OptimisticTxPartitionAndMergeDuringPrepareTest, and OptimisticTxPartitionAndMergeDuringRollbackTest.