Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-18594

Clustering - org.infinispan.util.concurrent.TimeoutException in JDG Stress tests

      We observed this errors in Clustering stress tests where a 3 nodes EAP cluster offloads session data to a 2 nodes JDG cluster:

      2020-01-30 11:55:03,427 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (async-thread--p26-t1) ISPN000136: Error executing command RemoveCommand on Cache 'clusterbench-ee8.ear.a.war', writing keys [SessionCreationMetaDataKey(rWHFsds9Wvx7HGu_gmYd3oiyjukE0YM8HZZaAd0T)]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(rWHFsds9Wvx7HGu_gmYd3oiyjukE0YM8HZZaAd0T) and requestor GlobalTx:wildfly1:112. Lock is held by GlobalTx:wildfly3:63
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:218)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:436)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:412)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
      	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at org.wildfly.clustering.service@7.3.0.GA-redhat-00003//org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      

      The EAP nodes are configured as follows:

      embed-server --server-config=standalone-ha.xml
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=udp)
      /subsystem=transactions:write-attribute(name=node-identifier,value=wildfly2)
      /socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=remote-jdg-server1:add(host=10.16.176.58, port=11222)
      /socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=remote-jdg-server2:add(host=10.16.176.56, port=11222)
      batch
      /subsystem=infinispan/remote-cache-container=web-sessions:add(default-remote-cluster=jdg-server-cluster)
      /subsystem=infinispan/remote-cache-container=web-sessions/remote-cluster=jdg-server-cluster:add(socket-bindings=[remote-jdg-server1,remote-jdg-server2])
      run-batch
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic:add()
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/store=hotrod:add(remote-cache-container=web-sessions, fetch-state=false, preload=false, passivation=false, purge=false, shared=false)
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/component=locking:add(isolation=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/component=transaction:add(mode=BATCH)
      /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=offload_ic)
      

      The bad thing is despite of the fact we don't fail nodes in stress tests, we have exceptions.
      These exceptions make performace results useless, hence they could mask an actual performance issue.

      EAP and JDG log files are attached;
      Complete test run here;

              pferraro@redhat.com Paul Ferraro
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: