-
Bug
-
Resolution: Done
-
Major
-
14.0.21.Final
-
None
I have failover scenario where the 490k entries load starts on site1, then in 10 seconds the whole site1 cluster is killed. So the client which was loading entries to site1 is switched automatically to site2, and in the most cases the execution ends up successfully. But sometimes (I have seen this behavior 4 times from 10 runs), at the moment when site1 is killed, the client gives the following exception:
03:33:40,348 ERROR [org.infinispan.HOTROD] (HotRod-client-async-pool-1-1) Failed to complete PutOperation{api-general-filestore-without-expiration, key=[B0x4A146B65795F30303030303030303030..[22], value=[B0x4A5276616C75655F6B65795F30303030..[84], flags=6, connection=10.16.90.106/10.16.90.106:11222} java.lang.NullPointerException: null at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.executeDirectlyIfPossible(ChannelPool.java:113) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelPool.acquire(ChannelPool.java:79) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:328) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:307) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:321) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:306) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory.fetchChannelAndInvoke(ChannelFactory.java:282) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.operations.AbstractKeyOperation.fetchChannelAndInvoke(AbstractKeyOperation.java:47) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.retryIfNotDone(RetryOnFailureOperation.java:101) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.logAndRetryOrFail(RetryOnFailureOperation.java:208) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.operations.RetryOnFailureOperation.channelInactive(RetryOnFailureOperation.java:136) ~[infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at org.infinispan.client.hotrod.impl.transport.netty.HeaderDecoder.channelInactive(HeaderDecoder.java:232) [infinispan-client-hotrod-14.0.22-SNAPSHOT.jar:14.0.22-SNAPSHOT] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:305) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.ChannelInboundHandlerAdapter.channelInactive(ChannelInboundHandlerAdapter.java:81) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.handler.timeout.IdleStateHandler.channelInactive(IdleStateHandler.java:277) [netty-handler-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:303) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:274) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1405) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:301) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:281) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:901) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe$7.run(AbstractChannel.java:813) [netty-transport-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:173) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:166) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:413) [netty-transport-classes-epoll-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.100.Final.jar:4.1.100.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.100.Final.jar:4.1.100.Final] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?] at java.lang.Thread.run(Thread.java:829) [?:?]
and then the the entries load process hangs. I have a thread dump from the client for this , from where I found this part:
Thread dump result on client: 376 "Loader-1" #58 prio=5 os_prio=0 cpu=314.65ms elapsed=7391.68s tid=0x00007faeb03f9800 nid=0x272 waiting on condition [0x00007fae533f2000] 377 java.lang.Thread.State: TIMED_WAITING (parking) 378 at jdk.internal.misc.Unsafe.park(java.base@11.0.12/Native Method) 379 - parking to wait for <0x000000060d630c78> (a java.util.concurrent.CompletableFuture$Signaller) 380 at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.12/LockSupport.java:234) 381 at java.util.concurrent.CompletableFuture$Signaller.block(java.base@11.0.12/CompletableFuture.java:1798) 382 at java.util.concurrent.ForkJoinPool.managedBlock(java.base@11.0.12/ForkJoinPool.java:3128) 383 at java.util.concurrent.CompletableFuture.timedGet(java.base@11.0.12/CompletableFuture.java:1868) 384 at java.util.concurrent.CompletableFuture.get(java.base@11.0.12/CompletableFuture.java:2021) 385 at org.infinispan.client.hotrod.impl.Util.await(Util.java:52) 386 at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:196) 387 at org.infinispan.client.hotrod.impl.RemoteCacheSupport.put(RemoteCacheSupport.java:186) 388 at org.radargun.service.HotRodOperations$HotRodCache.put(HotRodOperations.java:83) 389 at org.radargun.stages.cache.test.LoadStage$NonTxLoader.loadDataUnit(LoadStage.java:275) 390 at org.radargun.stages.test.LoadStage$Loader.run(LoadStage.java:125)
The cache configuration used for this scenario is given below and the issue appears for both SYNC/ASYNC backup strategies:
<distributed-cache name="api-general-filestore-without-expiration" owners="2" segments="30" mode="SYNC" remote-timeout="25000" start="EAGER"> <backups> <backup site="site02" strategy="ASYNC" timeout="30000"> <take-offline after-failures="-1" min-wait="60000"/> </backup> </backups> </distributed-cache>
Please note, the scenario is executed under JDK11 as it is running on our perf-lab. At the moment there is no possibility to run the scenario with JDK17 to have more concrete message on exception.