-
Bug
-
Resolution: Done
-
Blocker
-
8.0.0.GA-CR1
-
False
-
None
-
False
-
Regression
-
-
-
-
-
-
-
-
wildfly#129.zip Scenario: we have a 4 nodes cluster where every node is configured as in the following:
embed-server --server-config=standalone-ha.xml if (outcome != success) of /subsystem=jgroups:read-attribute(name=default-stack) /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp) else /subsystem=jgroups:write-attribute(name=default-stack,value=tcp) /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp) end-if
On each node, about 750 sessions are created; each session uses a Stateful Session Bean SFSB (we also spotted the same issue in a scenario using granular cache - more on this later);
Nodes are failed in a sequence by shutdown;
We noticed that Initial state transfer takes longer than on EAP7 and, eventually, times out:
service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3 at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66) at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1348) at java.base/java.lang.Thread.run(Thread.java:834) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.JBossThread.run(JBossThread.java:513) Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3 at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:239) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:726) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:672) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:561) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:575) at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:118) at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113) at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:70) at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:54) at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63) ... 7 more
We could reproduce the issue several times, and here we describe what happened in one of these runs:
- shut down node1
- restart node1
- shut down node2
- restart node2
- shut down node3
- restart node3
- shut down node4
- node3 logs "ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3" (please note wildfly3=node3)
- restart node4
Complete logs in wildfly#137.zip;
Also in this other 2 runs: wildfly#129.zip and wildfly#135.zip , the pattern is the same as in wildfly#137.zip;
We also spotted the same issue in a test using granular cache (not using SFSB): the pattern is the same as in wildfly#137.zip and the complete logs are here: wildfly-granular#123.zip ;
In run wildfly#136.zip the pattern is slightly different and the error pops out after node1 has been shut down & restarted and node2 has just been stopped:
2023-10-10 11:14:33,823 INFO [org.infinispan.CLUSTER] (thread-26,ejb,wildfly1) ISPN100001: Node wildfly2 left the cluster ... 2023-10-10 11:15:51,011 WARN [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 82) ISPN000100: Stopping, but there are 0 local transactions and 11 remote transactions that did not finish in time. 2023-10-10 11:15:51,029 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 82) WFLYCLINF0003: Stopped clusterbench-ee10.ear.clusterbench-ee10-web.war cache from web container 2023-10-10 11:15:51,030 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 82) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly1 at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66) at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377) at java.base/java.lang.Thread.run(Thread.java:834) at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.JBossThread.run(JBossThread.java:513) Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly1 at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:239) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:726) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:672) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:561) at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:575) at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:118) at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113) at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:70) at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:54) at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63) ... 7 more ... 2023-10-10 11:16:26,479 INFO [org.infinispan.CLUSTER] (thread-42,ejb,wildfly1) ISPN100000: Node wildfly2 joined the cluster
Please note the two logs just before the error which look suspicious:
2023-10-10 11:15:51,011 WARN [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 82) ISPN000100: Stopping, but there are 0 local transactions and 11 remote transactions that did not finish in time. 2023-10-10 11:15:51,029 INFO [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 82) WFLYCLINF0003: Stopped clusterbench-ee10.ear.clusterbench-ee10-web.war cache from web container
- is cloned by
-
JBEAP-26166 [CLUSTERING] ISPN000658: Initial state transfer timed out
- Closed