-
Bug
-
Resolution: Done
-
Critical
-
None
-
8.1.0.Beta
-
None
-
False
-
-
False
-
-
-
-
-
-
-
Context: we are running a fail-over test on a 4 nodes cluster with HAPROXY Load Balancer + JVMKILL induced fail-over;
We spotted the issue also with the server default configuration;
"granularity="ATTRIBUTE" Scenario (HAPROXY Load Balancer + JVMKILL induced fail-over)
The first configuration where we observed the issue is the one where each node is configured to use "granularity="ATTRIBUTE":
<subsystem xmlns="urn:jboss:domain:distributable-web:4.0" default-session-management="granular" default-single-sign-on-management="default">
<infinispan-session-management name="default" cache-container="web" granularity="SESSION">
<primary-owner-affinity/>
</infinispan-session-management>
<infinispan-session-management name="granular" cache-container="web" granularity="ATTRIBUTE">
<primary-owner-affinity/>
</infinispan-session-management>
<infinispan-single-sign-on-management name="default" cache-container="web" cache="sso"/>
<infinispan-routing cache-container="web" cache="routing"/>
</subsystem>
Moving from "8.0.5.GA-SP1-CR1" to "8.1.0.Beta-CR2" we observed, intermittently, the "ISPN000299" error approximately 3694 times:
2025-02-03 05:25:50,330 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-27,ejb,wildfly2) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(Nv7azanjqIeHW-BRBNjO53IHrzA4a5r_vhEuRqWD) and requestor GlobalTransaction{id=46879, addr=wildfly1, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=46878, addr=wildfly1, remote=true, xid=null, internalId=-1}
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:259)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:283)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractLockingInterceptor.lockAllAndRecord(AbstractLockingInterceptor.java:293)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.checkPendingAndLockAllKeys(AbstractTxLockingInterceptor.java:157)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.AbstractTxLockingInterceptor.lockAllOrRegisterBackupLock(AbstractTxLockingInterceptor.java:109)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.localLockCommandWork(PessimisticLockingInterceptor.java:211)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.lambda$new$0(PessimisticLockingInterceptor.java:44)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextThenApply(BaseAsyncInterceptor.java:85)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.locking.PessimisticLockingInterceptor.visitLockControlCommand(PessimisticLockingInterceptor.java:193)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:187)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.TxInterceptor.visitLockControlCommand(TxInterceptor.java:224)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.asyncInvokeNext(BaseAsyncInterceptor.java:231)
at org.infinispan.core@15.0.11.Final//org.infinispan.statetransfer.TransactionSynchronizerInterceptor.visitCommand(TransactionSynchronizerInterceptor.java:46)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndHandle(BaseAsyncInterceptor.java:189)
at org.infinispan.core@15.0.11.Final//org.infinispan.statetransfer.StateTransferInterceptor.visitLockControlCommand(StateTransferInterceptor.java:90)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:58)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.handleDefault(DDAsyncInterceptor.java:54)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:165)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNextAndExceptionally(BaseAsyncInterceptor.java:127)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.InvocationContextInterceptor.visitCommand(InvocationContextInterceptor.java:90)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.BaseAsyncInterceptor.invokeNext(BaseAsyncInterceptor.java:60)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.BatchingInterceptor.handleDefault(BatchingInterceptor.java:74)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitLockControlCommand(DDAsyncInterceptor.java:165)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.acceptVisitor(LockControlCommand.java:102)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.DDAsyncInterceptor.visitCommand(DDAsyncInterceptor.java:50)
at org.infinispan.core@15.0.11.Final//org.infinispan.interceptors.impl.AsyncInterceptorChainImpl.invokeAsync(AsyncInterceptorChainImpl.java:211)
at org.infinispan.core@15.0.11.Final//org.infinispan.commands.control.LockControlCommand.invokeAsync(LockControlCommand.java:112)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.invokeCommand(BasePerCacheInboundInvocationHandler.java:108)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.invoke(BaseBlockingRunnable.java:111)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.runAsync(BaseBlockingRunnable.java:79)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BaseBlockingRunnable.run(BaseBlockingRunnable.java:48)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.BasePerCacheInboundInvocationHandler.handleRunnable(BasePerCacheInboundInvocationHandler.java:154)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.TxPerCacheInboundInvocationHandler.handle(TxPerCacheInboundInvocationHandler.java:37)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleCacheRpcCommand(GlobalInboundInvocationHandler.java:123)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler.handleFromCluster(GlobalInboundInvocationHandler.java:77)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processRequest(JGroupsTransport.java:1527)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport.processMessage(JGroupsTransport.java:1454)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.lambda$up$0(JGroupsTransport.java:1663)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.infinispan.core@15.0.11.Final//org.infinispan.remoting.transport.jgroups.JGroupsTransport$ChannelCallbacks.up(JGroupsTransport.java:1655)
at org.jgroups@5.3.13.Final//org.jgroups.JChannel.up(JChannel.java:764)
at org.jgroups@5.3.13.Final//org.jgroups.fork.ForkProtocolStack.up(ForkProtocolStack.java:162)
at org.jgroups@5.3.13.Final//org.jgroups.stack.Protocol.up(Protocol.java:392)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FORK.up(FORK.java:175)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FRAG2.up(FRAG2.java:161)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:319)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FlowControl.up(FlowControl.java:319)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.GMS.up(GMS.java:867)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:255)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.deliverBatch(UNICAST3.java:1169)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.handleBatchReceived(UNICAST3.java:941)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.UNICAST3.up(UNICAST3.java:570)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:711)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.VERIFY_SUSPECT2.up(VERIFY_SUSPECT2.java:119)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FailureDetection.up(FailureDetection.java:193)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.FD_SOCK2.up(FD_SOCK2.java:204)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.MERGE3.up(MERGE3.java:288)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.Discovery.up(Discovery.java:316)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.RED.up(RED.java:123)
at org.jgroups@5.3.13.Final//org.jgroups.protocols.TP.passBatchUp(TP.java:1269)
at org.jgroups@5.3.13.Final//org.jgroups.util.SubmitToThreadPool$BatchHandler.passBatchUp(SubmitToThreadPool.java:137)
at org.jgroups@5.3.13.Final//org.jgroups.util.SubmitToThreadPool$BatchHandler.run(SubmitToThreadPool.java:133)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.ContextualExecutor$1.execute(ContextualExecutor.java:180)
at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.ContextualExecutor.execute(ContextualExecutor.java:31)
at org.wildfly.clustering.context@5.0.4.Final//org.wildfly.clustering.context.Contextualizer$2$1.run(Contextualizer.java:220)
at java.base/java.lang.Thread.run(Thread.java:840)
The error pops out on the second node of the cluster in a moment when all 4 nodes are up and running after the third node has been killed and restarted and before the fourth node is killed;
When we observe the error there is a slight degradation in the client fail rate: from ~0.001% of errors to 0.004% of errors;
Asymmetric Encryption Scenario (HAPROXY Load Balancer + JVMKILL induced fail-over)
The second configuration where we observed the issue is the one where each node is configured with asymmetric encryption of cluster traffic:
embed-server --server-config=standalone-ha.xml
/subsystem=infinispan/cache-container=web/distributed-cache=testDist:add()
/subsystem=infinispan/cache-container=web/distributed-cache=testDist/component=locking:write-attribute(name=isolation, value=REPEATABLE_READ)
/subsystem=infinispan/cache-container=web/distributed-cache=testDist/component=transaction:write-attribute(name=mode, value=BATCH)
/subsystem=infinispan/cache-container=web:write-attribute(name=default-cache, value=testDist)
# ASYM_ENCRYPT: Configured with algorithms and key sizes: secret key is generated and distributed by coordinator
batch
/subsystem=jgroups/stack=udp/protocol=AUTH:add(add-index=8)
/subsystem=jgroups/stack=udp/protocol=AUTH/token=digest:add(algorithm=SHA-512, shared-secret-reference={clear-text=123PIPPOBAUDO})
/subsystem=jgroups/stack=tcp/protocol=AUTH:add(add-index=8)
/subsystem=jgroups/stack=tcp/protocol=AUTH/token=digest:add(algorithm=SHA-512, shared-secret-reference={clear-text=123PIPPOBAUDO})
run-batch
/subsystem=jgroups/stack=udp/protocol=ASYM_ENCRYPT:add(add-index=5,properties={sym_keylength=128,sym_algorithm=AES/ECB/PKCS5Padding,asym_keylength=512,asym_algorithm=RSA})
/subsystem=jgroups/stack=tcp/protocol=ASYM_ENCRYPT:add(add-index=5,properties={sym_keylength=128,sym_algorithm=AES/ECB/PKCS5Padding,asym_keylength=512,asym_algorithm=RSA})
# we had an error with UDP on PSI
if (outcome != success) of /subsystem=jgroups:read-attribute(name=default-stack)
/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
else
/subsystem=jgroups:write-attribute(name=default-stack,value=tcp)
/subsystem=jgroups/channel=ee:write-attribute(name=stack,value=tcp)
end-if
Moving from "8.0.5.GA-SP1-CR1" to "8.1.0.Beta-CR2" we observed the "ISPN000299" error up to ~2000 times per test session; apparently it's triggered on a node when another node re-joins the cluster after temporarily leaving it (see e.g. test run #287 linked below in comments):
2025-02-03 09:37:39,943 INFO [org.infinispan.CLUSTER] (thread-66,ejb,wildfly3) ISPN100000: Node wildfly4 joined the cluster
...
2025-02-03 09:37:41,558 INFO [org.infinispan.LIFECYCLE] (thread-17,ejb,wildfly3) [Context=clusterbench-ee10.ear/clusterbench-ee10-ejb.jar/default] ISPN100010: Finished rebalance with members [wildfly1, wildfly2, wildfly3, wildfly4], topology id 50
2025-02-03 09:39:21,751 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-97,ejb,wildfly3) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(84HlxgemQlqvNQBbOv2a0n4fVkfz7D-OJCrjkfCy) and requestor GlobalTransaction{id=56952, addr=wildfly2, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=56951, addr=wildfly2, remote=true, xid=null, internalId=-1}
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:259)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.toInvocationStage(DefaultLockManager.java:283)
HAPROXY Load Balancer + JVMKILL induced fail-over
The third scenario where we spotted the ISPN000136 new error, is the default server configuration where the cluster is fronted by an HAPROXY Load Balancer;
Same as above, it's triggered by a node re-joining the cluster after it was killed and t pops out on some other node (see test run #306 linked in comments):
2025-02-03 00:16:01,036 INFO [org.infinispan.CLUSTER] (thread-47,ejb,wildfly3) ISPN100000: Node wildfly4 joined the cluster
...
2025-02-03 00:17:40,385 ERROR [org.infinispan.interceptors.impl.InvocationContextInterceptor] (thread-33,ejb,wildfly3) ISPN000136: Error executing command LockControlCommand on Cache 'clusterbench-ee10.ear.clusterbench-ee10-web.war', writing keys []: org.infinispan.commons.TimeoutException: ISPN000299: Unable to acquire lock after 0 milliseconds for key SessionMetaDataKey(8Ca8HLwwLTT1pqmp7FJLkztThB4jnfZye76VXqOX) and requestor GlobalTransaction{id=89144, addr=wildfly1, remote=true, xid=null, internalId=-1}. Lock is held by GlobalTransaction{id=999518, addr=wildfly2, remote=true, xid=null, internalId=-1}
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:299)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.DefaultLockManager$KeyAwareExtendedLockPromise.get(DefaultLockManager.java:229)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.checkState(InfinispanLock.java:554)
at org.infinispan.core@15.0.11.Final//org.infinispan.util.concurrent.locks.impl.InfinispanLock$LockPlaceHolder.toInvocationStage(InfinispanLock.java:481)