Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-26166

[CLUSTERING] ISPN000658: Initial state transfer timed out

XMLWordPrintable

      [^wildfly#129.zip] Scenario: we have a 4 nodes cluster where every node is configured as in the following:

      embed-server --server-config=standalone-ha.xml
      if (outcome != success) of /subsystem=jgroups:read-attribute(name=default-stack)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      else
      /subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      end-if
      

      On each node, about 750 sessions are created; each session uses a Stateful Session Bean SFSB (we also spotted the same issue in a scenario using granular cache - more on this later);
      Nodes are failed in a sequence by shutdown;

      We noticed that Initial state transfer takes longer than on EAP7 and, eventually, times out:

      service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66)
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1348)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.JBossThread.run(JBossThread.java:513)
      Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:239)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:726)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:672)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:561)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:575)
      	at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:118)
      	at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
      	at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:70)
      	at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:54)
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63)
      	... 7 more
      

      We could reproduce the issue several times, and here we describe what happened in one of these runs:

      1. shut down node1
      2. restart node1
      3. shut down node2
      4. restart node2
      5. shut down node3
      6. restart node3
      7. shut down node4
      8. node3 logs "ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly3" (please note wildfly3=node3)
      9. restart node4

      Complete logs in [^wildfly#137.zip];

      Also in this other 2 runs: [^wildfly#129.zip] and [^wildfly#135.zip] , the pattern is the same as in [^wildfly#137.zip];

      We also spotted the same issue in a test using granular cache (not using SFSB): the pattern is the same as in [^wildfly#137.zip] and the complete logs are here: [^wildfly-granular#123.zip] ;

      In run [^wildfly#136.zip] the pattern is slightly different and the error pops out after node1 has been shut down & restarted and node2 has just been stopped:

      2023-10-10 11:14:33,823 INFO  [org.infinispan.CLUSTER] (thread-26,ejb,wildfly1) ISPN100001: Node wildfly2 left the cluster
      ...
      2023-10-10 11:15:51,011 WARN  [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 82) ISPN000100: Stopping, but there are 0 local transactions and 11 remote transactions that did not finish in time.
      2023-10-10 11:15:51,029 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 82) WFLYCLINF0003: Stopped clusterbench-ee10.ear.clusterbench-ee10-web.war cache from web container
      2023-10-10 11:15:51,030 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 82) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.web."clusterbench-ee10.ear.clusterbench-ee10-web.war": org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly1
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:66)
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1990)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1486)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1377)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      	at org.jboss.threads@2.4.0.Final-redhat-00001//org.jboss.threads.JBossThread.run(JBossThread.java:513)
      Caused by: org.infinispan.util.concurrent.TimeoutException: ISPN000658: Initial state transfer timed out for cache clusterbench-ee10.ear.clusterbench-ee10-web.war on wildfly1
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:239)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1013)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:504)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:726)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:672)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:561)
      	at org.infinispan@14.0.11.Final-redhat-00001//org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:575)
      	at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:118)
      	at org.jboss.as.clustering.infinispan@8.0.0.GA-redhat-00009//org.jboss.as.clustering.infinispan.manager.DefaultCacheContainer.getCache(DefaultCacheContainer.java:113)
      	at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:70)
      	at org.wildfly.clustering.infinispan.embedded.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.infinispan.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:54)
      	at org.wildfly.clustering.service@8.0.0.GA-redhat-00009//org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:63)
      	... 7 more
      ...
      2023-10-10 11:16:26,479 INFO  [org.infinispan.CLUSTER] (thread-42,ejb,wildfly1) ISPN100000: Node wildfly2 joined the cluster
      

      Please note the two logs just before the error which look suspicious:

      2023-10-10 11:15:51,011 WARN  [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 82) ISPN000100: Stopping, but there are 0 local transactions and 11 remote transactions that did not finish in time.
      2023-10-10 11:15:51,029 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 82) WFLYCLINF0003: Stopped clusterbench-ee10.ear.clusterbench-ee10-web.war cache from web container
      

              pferraro@redhat.com Paul Ferraro
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: