Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-29906

[CLUSTERING] ISPN000136 / ISPN000299 Errors in Heavy Read Stress Tests

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Critical

      In "Heavy Read" tress tests we noticed a lot ISPN000136 / ISPN000299 Errors;

      Looks like this issue was introduced after "8.1.0.Beta-CR9" when we upgraded to Infinispan 15.0.14.Final-redhat-00002;

      Details: we have a 4 nodes cluster with HAPROXY load balancer and we fire requests with a 1 to 10 ratio of read only requests;


      Here is what happens in detail ...

      NODE 4:
      401 new sessions are created "07:52:01,784" --> "07:52:13,625"; among these sessions we also have session "SGGeJv_D2ChHx5eJeBHXs-4aktpp8Hnm7WpBhZLG" which, about ~15 seconds after creation, produces an error:

      2025-04-10 07:52:01,784 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-9) New session created: tHl6cUoNvIN6ytmTN-MQTfv_5Vok8boOGba4bNeo with 4kB cargo
      ...
      2025-04-10 07:52:01,784 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-25) New session created: SGGeJv_D2ChHx5eJeBHXs-4aktpp8Hnm7WpBhZLG with 4kB cargo
      ...
      2025-04-10 07:52:13,625 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-104) New session created: LmfD5yuwCEcbgdgVNkA4AZ0e2Yq_mfuxJLzWSH82 with 4kB cargo
      

      Complete Error:

      2025-04-10 07:52:17,901 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (non-blocking-thread-wildfly4-p5-t208) ISPN000136: Error executing command GetKeyValueCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionMetaDataKey(SGGeJv_D2ChHx5eJeBHXs-4aktpp8Hnm7WpBhZLG) and requestor GlobalTransaction{id=346, addr=wildfly4, remote=false, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=170, addr=wildfly4, remote=false, xid=null, internalId=-1}
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:484)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
      	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.ContextualExecutor$1.execute(ContextualExecutor.java:180)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.ContextualExecutor.execute(ContextualExecutor.java:31)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.Contextualizer$2$1.run(Contextualizer.java:220)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      
      2025-04-10 07:52:17,905 ERROR [io.undertow.servlet.request] (default task-50) UT015005: Error invoking method requestInitialized on listener class org.jboss.weld.module.web.servlet.WeldInitialListener: java.util.concurrent.CompletionException: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionMetaDataKey(SGGeJv_D2ChHx5eJeBHXs-4aktpp8Hnm7WpBhZLG) and requestor GlobalTransaction{id=346, addr=wildfly4, remote=false, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=170, addr=wildfly4, remote=false, xid=null, internalId=-1}
      	at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
      	at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
      	at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2162)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.invokeQueuedHandlers(QueueAsyncInvocationStage.java:114)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:87)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.interceptors.impl.QueueAsyncInvocationStage.accept(QueueAsyncInvocationStage.java:33)
      	at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:863)
      	at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:841)
      	at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
      	at java.base/java.util.concurrent.CompletableFuture.postFire(CompletableFuture.java:614)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:653)
      	at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.ContextualExecutor$1.execute(ContextualExecutor.java:180)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.ContextualExecutor.execute(ContextualExecutor.java:31)
      	at org.wildfly.clustering.context@5.0.4.Final-redhat-00001//org.wildfly.clustering.context.Contextualizer$2$1.run(Contextualizer.java:220)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      Caused by: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key SessionMetaDataKey(SGGeJv_D2ChHx5eJeBHXs-4aktpp8Hnm7WpBhZLG) and requestor GlobalTransaction{id=346, addr=wildfly4, remote=false, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=170, addr=wildfly4, remote=false, xid=null, internalId=-1}
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
      	at org.infinispan.core@15.0.14.Final-redhat-00002//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.lambda$toInvocationStage$3(InfinispanLock.java:484)
      	at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:646)
      	... 7 more
      

      Later in the logs we have the same kind of error for other sessions;

      Later on 150 more sessions are created:

      2025-04-10 07:52:19,481 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-144) New session created: PT2hH4ywHSkD_pn3ac0UxkOW0hmyeozNbq5lWMrb with 4kB cargo
      ...
      2025-04-10 07:52:19,655 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-209) New session created: WCL5rBbk0O-ZwL89SBqIlwjC34_GedAGlMvopi6v with 4kB cargo
      

      And then, another 939:

      2025-04-10 07:52:25,509 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-156) New session created: 7oqqd_09OnvQ3IloWrTIFTOEANdgFN8ONzagT1OX with 4kB cargo
      ...
      2025-04-10 07:52:59,253 INFO  [org.jboss.test.clusterbench.common.session.CommonHttpSessionServlet] (default task-198) New session created: VqISL3xEFuv63CmjTWsJ8_21fqc73ev91wk7bYaq with 4kB cargo
      

      For a total ~1500 sessions ... which is somewhat suspicious becasue when we run with 8.0 we have an even distrubution of session with the same HAPROXY load balancer:

      [8.1.0.Beta-CR11.1]$ ll
      total 20024
      -rw-rw-r--. 1 tborgato tborgato 6592060 Apr 10 08:16 wlf_20254410-074448-wildfly-service-1-server.log
      -rw-rw-r--. 1 tborgato tborgato   34527 Apr 10 08:16 wlf_20254410-074448-wildfly-service-1-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 1029042 Apr 10 08:16 wlf_20254410-074448-wildfly-service-2-server.log
      -rw-rw-r--. 1 tborgato tborgato   34527 Apr 10 08:16 wlf_20254410-074448-wildfly-service-2-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 5352844 Apr 10 08:16 wlf_20254410-074448-wildfly-service-3-server.log
      -rw-rw-r--. 1 tborgato tborgato   34527 Apr 10 08:16 wlf_20254410-074448-wildfly-service-3-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 7375154 Apr 10 08:16 wlf_20254410-074448-wildfly-service-4-server.log
      -rw-rw-r--. 1 tborgato tborgato   34527 Apr 10 08:16 wlf_20254410-074448-wildfly-service-4-standalone-ha.xml
      [8.1.0.Beta-CR11.1]$ grep 'New session created' wlf_20254410-074448-wildfly-service-1-server.log | wc -l
      1628
      [8.1.0.Beta-CR11.1]$ grep 'New session created' wlf_20254410-074448-wildfly-service-2-server.log | wc -l
      1960
      [8.1.0.Beta-CR11.1]$ grep 'New session created' wlf_20254410-074448-wildfly-service-3-server.log | wc -l
      1628
      [8.1.0.Beta-CR11.1]$ grep 'New session created' wlf_20254410-074448-wildfly-service-4-server.log | wc -l
      1490
      [8.0.7.GA-CR1]$ ll
      total 1580
      -rw-rw-r--. 1 tborgato tborgato 378940 Apr 10 12:56 wlf_20252510-122554-wildfly-service-1-server.log
      -rw-rw-r--. 1 tborgato tborgato  34548 Apr 10 12:56 wlf_20252510-122554-wildfly-service-1-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 366864 Apr 10 12:56 wlf_20252510-122554-wildfly-service-2-server.log
      -rw-rw-r--. 1 tborgato tborgato  34548 Apr 10 12:56 wlf_20252510-122554-wildfly-service-2-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 363251 Apr 10 12:56 wlf_20252510-122554-wildfly-service-3-server.log
      -rw-rw-r--. 1 tborgato tborgato  34548 Apr 10 12:56 wlf_20252510-122554-wildfly-service-3-standalone-ha.xml
      -rw-rw-r--. 1 tborgato tborgato 354001 Apr 10 12:56 wlf_20252510-122554-wildfly-service-4-server.log
      -rw-rw-r--. 1 tborgato tborgato  34548 Apr 10 12:56 wlf_20252510-122554-wildfly-service-4-standalone-ha.xml
      [8.0.7.GA-CR1]$ grep 'New session created'  wlf_20252510-122554-wildfly-service-1-server.log  | wc -l
      1500
      [8.0.7.GA-CR1]$ grep 'New session created'  wlf_20252510-122554-wildfly-service-2-server.log  | wc -l
      1500
      [8.0.7.GA-CR1]$ grep 'New session created'  wlf_20252510-122554-wildfly-service-3-server.log  | wc -l
      1500
      [8.0.7.GA-CR1]$ grep 'New session created'  wlf_20252510-122554-wildfly-service-4-server.log  | wc -l
      1500
      

      HAPROXY is configured as in the following:

      global
          daemon
      
      defaults
          log global
          option httplog
          option dontlog-normal
      
          timeout connect 2000ms
          timeout client 3000ms
          timeout server 3000ms
          timeout check 800ms
      
      frontend myfrontend
          mode http
          bind *:8080
          default_backend myservers
      
      backend myservers
          mode http
          cookie HAPROXY_AFFINITY_COOKIE indirect preserve
          option redispatch
          option httpchk GET /clusterbench/
          http-check expect status 200
          server wildfly2 10.6.60.176:8080 check cookie wildfly2 observe layer7
          server wildfly1 10.6.60.177:8080 check cookie wildfly1 observe layer7
          server wildfly3 10.6.60.175:8080 check cookie wildfly3 observe layer7
          server wildfly4 10.6.60.174:8080 check cookie wildfly4 observe layer7
      

      EAP Nodes config:

      embed-server --server-config=standalone-ha.xml
      if (outcome != success) of /subsystem=jgroups:read-attribute(name=default-stack)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      else
      /subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      end-if
      batch
      /subsystem=infinispan/cache-container=web/distributed-cache=dist:remove
      /subsystem=infinispan/cache-container=ejb/distributed-cache=dist:remove
      # web cache
      /subsystem=infinispan/cache-container=web/replicated-cache=dist:add()
      /subsystem=infinispan/cache-container=web/replicated-cache=dist/component=locking:add(isolation=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=web/replicated-cache=dist/component=transaction:add(mode=BATCH)
      /subsystem=infinispan/cache-container=web/replicated-cache=dist/store=file:add(purge=true, passivation=true)
      /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=dist)
      # ejb cache
      /subsystem=infinispan/cache-container=ejb/replicated-cache=dist:add()
      /subsystem=infinispan/cache-container=ejb/replicated-cache=dist/component=locking:add(isolation=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=ejb/replicated-cache=dist/component=transaction:add(mode=BATCH)
      /subsystem=infinispan/cache-container=ejb/replicated-cache=dist/store=file:add(purge=true, passivation=true)
      /subsystem=infinispan/cache-container=ejb:write-attribute(name=default-cache, value=dist)
      # AFFINITY_COOKIE
      /subsystem=undertow/servlet-container=default/setting=affinity-cookie:add(name=HAPROXY_AFFINITY_COOKIE)
      /subsystem=undertow:write-attribute(name=instance-id,value=wildflyN)
      

      Please note EAP is configured with a dedicated "HAPROXY_AFFINITY_COOKIE":

              <subsystem xmlns="urn:jboss:domain:undertow:14.0" default-virtual-host="default-host" default-servlet-container="default" default-server="default-server" instance-id="wildfly1" statistics-enabled="${wildfly.undertow.statistics-enabled:${wildfly.statistics-enabled:false}}" default-security-domain="other">
                  <byte-buffer-pool name="default"/>
                  <buffer-cache name="default"/>
                  <server name="default-server">
                      <ajp-listener name="ajp" socket-binding="ajp"/>
                      <http-listener name="default" socket-binding="http" redirect-socket="https" enable-http2="true"/>
                      <https-listener name="https" socket-binding="https" ssl-context="applicationSSC" enable-http2="true"/>
                      <host name="default-host" alias="localhost">
                          <location name="/" handler="welcome-content"/>
                          <http-invoker http-authentication-factory="application-http-authentication"/>
                      </host>
                  </server>
                  <servlet-container name="default">
                      <jsp-config/>
                      <affinity-cookie name="HAPROXY_AFFINITY_COOKIE"/>
                      <websockets/>
                  </servlet-container>
                  <handlers>
                      <file name="welcome-content" path="${jboss.home.dir}/welcome-content"/>
                  </handlers>
                  <application-security-domains>
                      <application-security-domain name="other" security-domain="ApplicationDomain"/>
                  </application-security-domains>
              </subsystem>
      

              pferraro@redhat.com Paul Ferraro
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: