Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-29214

[CLUSTERING] "ISPN000136: Error executing command LockControlCommand"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • 8.1.0.Beta
    • Clustering
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Context: we are running a fail-over test on a 4 nodes cluster with HAPROXY Load Balancer + JVMKILL induced fail-over;

      We spotted the issue also with the server default configuration;


      "granularity="ATTRIBUTE" Scenario (HAPROXY Load Balancer + JVMKILL induced fail-over)

      The first configuration where we observed the issue is the one where each node is configured to use "granularity="ATTRIBUTE":

              <subsystem xmlns="urn:jboss:domain:distributable-web:4.0" default-session-management="granular" default-single-sign-on-management="default">
                  <infinispan-session-management name="default" cache-container="web" granularity="SESSION">
                      <primary-owner-affinity/>
                  </infinispan-session-management>
                  <infinispan-session-management name="granular" cache-container="web" granularity="ATTRIBUTE">
                      <primary-owner-affinity/>
                  </infinispan-session-management>
                  <infinispan-single-sign-on-management name="default" cache-container="web" cache="sso"/>
                  <infinispan-routing cache-container="web" cache="routing"/>
              </subsystem>
      

      Moving from "8.0.5.GA-SP1-CR1" to "8.1.0.Beta-CR2" we observed, intermittently, the "ISPN000299" error approximately 3694 times:

      2025-02-03 05:25:50,330 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-27,ejb,wildfly2) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(Nv7azanjqIeHW-BRBNjO53IHrzA4a5r_vhEuRqWD) and requestor GlobalTransaction{id=46879, addr=wildfly1, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=46878, addr=wildfly1, remote=true, xid=null, internalId=-1}
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:259)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:283)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAllAndRecord(AbstractLockingInterceptor.java:293)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.checkPendingAndLockAllKeys(AbstractTxLockingInterceptor.java:157)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAllOrRegisterBackupLock(AbstractTxLockingInterceptor.java:109)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:211)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:44)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:85)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:193)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:187)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:224)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:231)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:189)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:165)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:127)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:60)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.BatchingInterceptor.handleDefault(BatchingInterceptor.java:74)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:165)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:211)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:112)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:108)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:111)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:79)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:48)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.handleRunnable(BasePerCacheInboundInvocationHandler.java:154)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.TxPerCacheInboundInvocationHandler.handle(TxPerCacheInboundInvocationHandler.java:37)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleCacheRpcCommand(GlobalInboundInvocationHandler.java:123)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleFromCluster(GlobalInboundInvocationHandler.java:77)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processRequest(JGroupsTransport.java:1527)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1454)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.lambda$up$0(JGroupsTransport.java:1663)
      	at java.base/java.lang.Iterable.forEach(Iterable.java:75)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1655)
      	at org.jgroups@5.3.13.Final//org.jgroups.JChannel.up(JChannel.java:764)
      	at org.jgroups@5.3.13.Final//org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:162)
      	at org.jgroups@5.3.13.Final//org.jgroups.stack.Protocol.up(Protocol.java:392)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FORK.up(FORK.java:175)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FRAG2.up(FRAG2.java:161)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:319)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:319)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.GMS.up(GMS.java:867)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:255)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1169)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:941)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.up(UNICAST3.java:570)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:711)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:119)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FailureDetection.up(FailureDetection.java:193)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:204)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.Discovery.up(Discovery.java:316)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.RED.up(RED.java:123)
      	at org.jgroups@5.3.13.Final//org.jgroups.protocols.TP.passBatchUp(TP.java:1269)
      	at org.jgroups@5.3.13.Final//org.jgroups.util.SubmitToThreadPool$BatchHandler.passBatchUp(SubmitToThreadPool.java:137)
      	at org.jgroups@5.3.13.Final//org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:133)
      	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
      	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
      	at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.ContextualExecutor$1.execute(ContextualExecutor.java:180)
      	at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.ContextualExecutor.execute(ContextualExecutor.java:31)
      	at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.Contextualizer$2$1.run(Contextualizer.java:220)
      	at java.base/java.lang.Thread.run(Thread.java:840)
      

      The error pops out on the second node of the cluster in a moment when all 4 nodes are up and running after the third node has been killed and restarted and before the fourth node is killed;

      When we observe the error there is a slight degradation in the client fail rate: from ~0.001% of errors to 0.004% of errors;


      Asymmetric Encryption Scenario (HAPROXY Load Balancer + JVMKILL induced fail-over)

      The second configuration where we observed the issue is the one where each node is configured with asymmetric encryption of cluster traffic:

      embed-server --server-config=standalone-ha.xml
      /subsystem=infinispan/cache-container=web/distributed-cache=testDist:add()
      /subsystem=infinispan/cache-container=web/distributed-cache=testDist/component=locking:write-attribute(name=isolation, value=REPEATABLE_READ)
      /subsystem=infinispan/cache-container=web/distributed-cache=testDist/component=transaction:write-attribute(name=mode, value=BATCH)
      /subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=testDist)
      # ASYM_ENCRYPT: Configured with algorithms and key sizes: secret key is generated and distributed by coordinator
      batch
      /subsystem=jgroups/stack=udp/protocol=AUTH:add(add-index=8)
      /subsystem=jgroups/stack=udp/protocol=AUTH/token=digest:add(algorithm=SHA-512, shared-secret-reference={clear-text=123PIPPOBAUDO})
      /subsystem=jgroups/stack=tcp/protocol=AUTH:add(add-index=8)
      /subsystem=jgroups/stack=tcp/protocol=AUTH/token=digest:add(algorithm=SHA-512, shared-secret-reference={clear-text=123PIPPOBAUDO})
      run-batch
      /subsystem=jgroups/stack=udp/protocol=ASYM_ENCRYPT:add(add-index=5,properties={sym_keylength=128,sym_algorithm=AES/ECB/PKCS5Padding,asym_keylength=512,asym_algorithm=RSA})
      /subsystem=jgroups/stack=tcp/protocol=ASYM_ENCRYPT:add(add-index=5,properties={sym_keylength=128,sym_algorithm=AES/ECB/PKCS5Padding,asym_keylength=512,asym_algorithm=RSA})
      # we had an error with UDP on PSI
      if (outcome != success) of /subsystem=jgroups:read-attribute(name=default-stack)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      else
      /subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
      /subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
      end-if
      

      Moving from "8.0.5.GA-SP1-CR1" to "8.1.0.Beta-CR2" we observed the "ISPN000299" error up to ~2000 times per test session; apparently it's triggered on a node when another node re-joins the cluster after temporarily leaving it (see e.g. test run #287 linked below in comments):

      2025-02-03 09:37:39,943 INFO  [org.infinispan.CLUSTER] (thread-66,ejb,wildfly3) ISPN100000: Node wildfly4 joined the cluster
      ...
      2025-02-03 09:37:41,558 INFO  [org.infinispan.LIFECYCLE] (thread-17,ejb,wildfly3) [Context=clusterbench-ee10.ear/clusterbench-ee10-ejb.jar/default] ISPN100010: Finished rebalance with members [wildfly1, wildfly2, wildfly3, wildfly4], topology id 50
      2025-02-03 09:39:21,751 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-97,ejb,wildfly3) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(84HlxgemQlqvNQBbOv2a0n4fVkfz7D-OJCrjkfCy) and requestor GlobalTransaction{id=56952, addr=wildfly2, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=56951, addr=wildfly2, remote=true, xid=null, internalId=-1}
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:259)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:283)
      

      HAPROXY Load Balancer + JVMKILL induced fail-over

      The third scenario where we spotted the ISPN000136 new error, is the default server configuration where the cluster is fronted by an HAPROXY Load Balancer;

      Same as above, it's triggered by a node re-joining the cluster after it was killed and t pops out on some other node (see test run #306 linked in comments):

      2025-02-03 00:16:01,036 INFO  [org.infinispan.CLUSTER] (thread-47,ejb,wildfly3) ISPN100000: Node wildfly4 joined the cluster
      ...
      2025-02-03 00:17:40,385 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-33,ejb,wildfly3) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(8Ca8HLwwLTT1pqmp7FJLkztThB4jnfZye76VXqOX) and requestor GlobalTransaction{id=89144, addr=wildfly1, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=999518, addr=wildfly2, remote=true, xid=null, internalId=-1}
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
      	at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)
      

              rhn-engineering-rhusar Radoslav Husar
              tborgato@redhat.com Tommaso Borgato
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: