Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-6877

Fine session attributes (i.e. replication-granularity = ATTRIBUTE) cause org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds

XMLWordPrintable

      Our system using the following infinispan configuration for web session cache (i.e. an async, non-transactional, batch, optimistic locking dustrubutable cache):

      <cache-container name="web" default-cache="dist" module="org.wildfly.clustering.web.infinispan">
      	<transport lock-timeout="60000"/>
      	<distributed-cache name="dist" mode="ASYNC" l1-lifespan="0" owners="2">
      		<locking isolation="REPEATABLE_READ"/>
      		<transaction locking="OPTIMISTIC" mode="BATCH"/>
      		<file-store/>
      	</distributed-cache>
      </cache-container>
      

      When we perform a load test for one of our business use case in our system, Wildfly always throw the infamous locking for 15 seconds timeout exception:

      2016-07-21 13:01:37,971 TRACE [org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-33) Lock all keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY] for owner=GlobalTransaction:<10.1.3.29>:492:local. timeout=15000 (MILLISECONDS)
      2016-07-21 13:01:52,975 ERROR [org.infinispan.interceptors.InvocationContextInterceptor] (default task-33) ISPN000136: Error executing command PrepareCommand, writing keys [4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY]: org.infinispan.util.concurrent.TimeoutException: ISPN000299: Unable to acquire lock after 15 seconds for key 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY and requestor GlobalTransaction:<10.1.3.29>:492:local. Lock is held by GlobalTransaction:<10.1.3.29>:458:local
      	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.lock(DefaultLockManager.java:236)
      	at org.infinispan.util.concurrent.locks.impl.DefaultLockManager$CompositeLockPromise.lock(DefaultLockManager.java:305)
      	at org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAllAndRecord(AbstractLockingInterceptor.java:199)
      	at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.checkPendingAndLockAllKeys(AbstractTxLockingInterceptor.java:199)
      	at org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAllOrRegisterBackupLock(AbstractTxLockingInterceptor.java:165)
      	at org.infinispan.interceptors.locking.OptimisticLockingInterceptor.visitPrepareCommand(OptimisticLockingInterceptor.java:69)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      	at org.infinispan.interceptors.TxInterceptor.invokeNextInterceptorAndVerifyTransaction(TxInterceptor.java:157)
      	at org.infinispan.interceptors.TxInterceptor.visitPrepareCommand(TxInterceptor.java:144)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
      	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      	at org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitPrepareCommand(TransactionSynchronizerInterceptor.java:42)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      	at org.infinispan.statetransfer.StateTransferInterceptor.handleTxCommand(StateTransferInterceptor.java:238)
      	at org.infinispan.statetransfer.StateTransferInterceptor.visitPrepareCommand(StateTransferInterceptor.java:87)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      	at org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:107)
      	at org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:76)
      	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:99)
      	at org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:113)
      	at org.infinispan.commands.AbstractVisitor.visitPrepareCommand(AbstractVisitor.java:112)
      	at org.infinispan.commands.tx.PrepareCommand.acceptVisitor(PrepareCommand.java:173)
      	at org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:336)
      	at org.infinispan.transaction.impl.TransactionCoordinator.commit(TransactionCoordinator.java:157)
      	at org.infinispan.transaction.xa.TransactionXaAdapter.commit(TransactionXaAdapter.java:114)
      	at org.infinispan.transaction.tm.DummyTransaction.finishResource(DummyTransaction.java:389)
      	at org.infinispan.transaction.tm.DummyTransaction.commitResources(DummyTransaction.java:435)
      	at org.infinispan.transaction.tm.DummyTransaction.runCommit(DummyTransaction.java:314)
      	at org.infinispan.transaction.tm.DummyTransaction.commit(DummyTransaction.java:105)
      	at org.infinispan.transaction.tm.DummyBaseTransactionManager.commit(DummyBaseTransactionManager.java:73)
      	at org.wildfly.clustering.ee.infinispan.ActiveTransactionBatch.close(ActiveTransactionBatch.java:48)
      	at org.wildfly.clustering.web.undertow.session.DistributableSession.requestDone(DistributableSession.java:77)
      	at io.undertow.servlet.spec.ServletContextImpl.updateSessionAccessTime(ServletContextImpl.java:768)
      	at io.undertow.servlet.spec.HttpServletResponseImpl.responseDone(HttpServletResponseImpl.java:557)
      	at io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(ServletInitialHandler.java:331)
      	at io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(ServletInitialHandler.java:263)
      	at io.undertow.servlet.handlers.ServletInitialHandler.access$000(ServletInitialHandler.java:81)
      	at io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(ServletInitialHandler.java:174)
      	at io.undertow.server.Connectors.executeRootHandler(Connectors.java:202)
      	at io.undertow.server.HttpServerExchange$1.run(HttpServerExchange.java:793)
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      	at java.lang.Thread.run(Thread.java:745)
      

      After tracing using the trace log, we found the time difference between a "Lock all keys" log statement and the "Release locks for keys" log statement takes for about 15~20 seconds, so if another concurrent request would like to lock the same key within that interval, it will result in the timeout exception, example as below:

      2016-07-21 13:01:37,958 TRACE [org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-37) Lock all keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl] for owner=GlobalTransaction:<10.1.3.29>:458:local. timeout=15000 (MILLISECONDS)
      2016-07-21 13:01:53,364 TRACE [org.infinispan.util.concurrent.locks.impl.DefaultLockManager] (default task-37) Release locks for keys=[4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ, 4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl]. owner=GlobalTransaction:<10.1.3.29>:458:local
      

      We think that is strange as those session keys are POJO without complex nested objects, and we only perform read operation on those objects. Although we know objects without marked @org.wildfly.clustering.web.annotation.Immutable will perform cache sync, we believe that should not took for more than 1 second to commit the cache but it takes for more than 15 seconds...

      After future tracing, we found the following log message on another cluster node:

      2016-07-21 13:01:53,410 TRACE [org.infinispan.commands.tx.PrepareCommand] (Incoming-10,ee,10.1.3.30) {} Invoking remotely originated prepare: PrepareCommand {modifications=[PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX, value=[B@f9f09c2, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, value=[B@f72eca9, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ, value=[B@6f108f88, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl, value=org.wildfly.clustering.web.infinispan.session.SimpleSessionAccessMetaData@126bb5d4, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->XXX, value=[B@1a574fb5, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->YYY, value=[B@3141bcb, flags=[IGNORE_RETURN_VALUES], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, PutKeyValueCommand{key=4V3S9Lw0pGPAK3wt8bYEt3BK06iimxk88geCH2Gl->ZZZ.menuName, value=[B@ae3a302, flags=[FORCE_SYNCHRONOUS], putIfAbsent=false, valueMatcher=MATCH_ALWAYS, metadata=EmbeddedExpirableMetadata{lifespan=-1, maxIdle=-1, version=null}, successful=true}, ...], onePhaseCommit=true, retried=false, gtx=GlobalTransaction:<10.1.3.29>:458:remote, cacheName='app.war', topologyId=2} with invocation context: org.infinispan.context.impl.RemoteTxInvocationContext@3d21e5e9
      

      In general, it synchronized XXX key 2 times, YYY key 2272 times, ZZZ key 3 times!!!

      The root cause should be this line: https://github.com/wildfly/wildfly/blob/10.0.0.Final/clustering/web/infinispan/src/main/java/org/wildfly/clustering/web/infinispan/session/fine/FineSessionAttributes.java#L76 where a new CacheEntryMutator.mutate() will trigger a new PutKeyValueCommand. For consecutive get call on the same session key, it will trigger the PutKeyValueCommand multiple times, but that is not necessary for transactional/ batched cache in general as the MarshalledValue is lazily serialized, the actual value is pointing to the same instance.

      Attached the suggested change, which solved our issue. Basically use a Set to track the mutated attribute names, to ensure the command is only issued once. Tested on non-transactional, batched cache (non-transactional, non-batch cache may not work as I did not think about that...)

        1. FineSessionAttributes.java
          4 kB
          Sammy Chu
        2. FineSessionAttributes.java
          4 kB
          Sammy Chu

              pferraro@redhat.com Paul Ferraro
              sammyhk Sammy Chu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: