Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-18592

Clustering - org.infinispan.util.concurrent.TimeoutException before node fail, in JDG scenario

XMLWordPrintable

      We observed this errors in Clustering fail-over test where nodes are failed via JVM kill:

      2020-01-30 08:55:58,452 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (async-thread--p26-t1) ISPN000136: Error executing command RemoveCommand on Cache 'clusterbench-ee8.ear.a.war', writing keys [SessionCreationMetaDataKey(7DW-VFPUGOVPBF1VymSNzkpZurUyiXUF4CXKk-L5)]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionCreationMetaDataKey(7DW-VFPUGOVPBF1VymSNzkpZurUyiXUF4CXKk-L5) and requestor GlobalTx:wildfly1:117. Lock is held by GlobalTx:wildfly2:70
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:288)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:218)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:436)
      	at org.infinispan@9.4.16.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:412)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
      	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:478)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
      	at org.wildfly.clustering.service@7.3.0.GA-redhat-00003//org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      	at java.base/java.lang.Thread.run(Thread.java:834)
      

      In these tests we have a 2 nodes EAP cluster offloading session data to a 2 nodes JDG cluster; here is the cli script used to configure the EAP nodes:

      embed-server --server-config=standalone-ha.xml
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      /subsystem=transactions:write-attribute(name=node-identifier,value=wildfly2)
      /socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=remote-jdg-server1:add(host=10.0.147.197, port=11222)
      /socket-binding-group=standard-sockets/remote-destination-outbound-socket-binding=remote-jdg-server2:add(host=10.0.147.209, port=11222)
      batch
      /subsystem=infinispan/remote-cache-container=web-sessions:add(default-remote-cluster=jdg-server-cluster)
      /subsystem=infinispan/remote-cache-container=web-sessions/remote-cluster=jdg-server-cluster:add(socket-bindings=[remote-jdg-server1,remote-jdg-server2])
      run-batch
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic:add()
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/store=hotrod:add(remote-cache-container=web-sessions, fetch-state=false, preload=false, passivation=false, purge=false, shared=false)
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/component=locking:add(isolation=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=web/invalidation-cache=offload_ic/component=transaction:add(mode=BATCH)
      /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=offload_ic)
      

      The overall fail rate increases from 0.6% to 0.7% and is under the 2% threshold that makes the test fail;

      What makes this errors worth noting is that they happen BEFORE the EAP nodes are failed; this did not happen in 7.2;

      Complete logs attached;
      Test phases overview here;

        1. wlf_20204930-084931-jdg-service-1-server.log
          28 kB
          Tommaso Borgato
        2. wlf_20204930-084931-jdg-service-2-server.log
          24 kB
          Tommaso Borgato
        3. wlf_20204930-084931-wildfly-service-1-server.log
          702 kB
          Tommaso Borgato
        4. wlf_20204930-084931-wildfly-service-2-server.log
          1.05 MB
          Tommaso Borgato

              pferraro@redhat.com Paul Ferraro
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: