-
Story
-
Resolution: Done
-
Critical
-
None
-
AMQ 7.8.7.GA
-
False
-
None
-
False
-
-
-
There is intermittent failure in the following crash recovery scenarios with JTS and MDB listening on "in-vm" Artemis.
Test Scenario:
- Start server with configured JTS and deploy MDB
- Start XA transaction by sending message into the "inbound" queue for MDB
- There is also enlisted tx branch with TestXAResource to control moment when server should be crashed. Server is killed in the moment when:
- Prepare is finished on tx branch with "inbound" message for MDB
- When prepare is called on tx branch with TestXAResource, server is killed and started again but with MDB undeployed
- Wait for the recovery of XA transaction.
Expected result: Rollback is expected so message gets back into the "inbound" queue.
Actual result: Transaction is not rolled back during recovery.
Customer impact: In case that server with deployed MDB processing message in JTS transaction crashes then it may happen that transaction is not recovered and is "stuck" in the queue.
Investigation:
There is started one XA transaction with 3 tx branches. There are 2 tx branches for receiving the message by MDB which is weird but it might be a feature of JTS. During recovery only one is rolled back but when calling rollback for the 2nd branch then exception XAException.XA_RETRY is thrown from Artemis. By investigation of trace logs, this is what happens.
There is started one XA transaction with 2 tx branches for receiving the message to MDB on inbound connection. However just one is "holding" the message.
Then during recovery only one is rolled back and the 2nd one is not. From trace logs there is called rollback for the 1st tx branch twice in a row. What happens is that the first attempt succeeds and second one fails (as it was already rolled back). Problem is that this 2nd attempt for rollback throws XAException:
2024-01-12 18:01:17,361 DEBUG [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1c5a52f1)) Sending exception to client: org.apache.activemq.artemis.core.exception.ActiveMQXAException: Cannot find xid in resource manager: XidImpl (1318536416 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.127.0.0.1.18.31.-106.-15.101.-95.126.8.0.0.0.87.0.0.0.0.0.0.0.0 formatID:131072 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.127.0.0.1.18.31.-106.-15.101.-95.126.8.0.0.0.70.98.69.108.106 base64:AAAAAAAAAAAAAP__fwAAARIflvFloX4IAAAAVwAAAAAAAAAAAAAAAAAAAAAAAP__fwAAARIflvFloX4IAAAARmJFbGoCAgIA at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.xaRollback(ServerSessionImpl.java:1563) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:522) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:298) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:67) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:56) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:67) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118
which closes the underlying in-vm connection 6e6e02f3-b174-11ee-a507-fa163e427ff2 to broker:
2024-01-12 18:01:17,370 DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionImpl] (Thread-107) calling cleanup on ClientSessionImpl [name=6e6e02f3-b174-11ee-a507-fa163e427ff2, username=null, closed=false, factory = org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@5f28e125, metaData=()]@77377553 2024-01-12 18:01:17,373 TRACE [org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl] (Thread-107) Connection removed 6e6c2e31-b174-11ee-a507-fa163e427ff2 from server ActiveMQServerImpl::name=default: java.lang.Exception: trace at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:590) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMAcceptor$Listener.connectionDestroyed(InVMAcceptor.java:298) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection.close(InVMConnection.java:148) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMAcceptor.disconnect(InVMAcceptor.java:248) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnector$Listener.connectionDestroyed(InVMConnector.java:272) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection.close(InVMConnection.java:148) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.internalClose(RemotingConnectionImpl.java:441) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.destroy(RemotingConnectionImpl.java:243) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.checkCloseConnection(ClientSessionFactoryImpl.java:961) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.closeCleanSessions(ClientSessionFactoryImpl.java:500) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.interruptConnectAndCloseAllSessions(ClientSessionFactoryImpl.java:475) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.close(ClientSessionFactoryImpl.java:508) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.close(ActiveMQXAResourceWrapper.java:393) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.check(ActiveMQXAResourceWrapper.java:420) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:111) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372)
which causes fail of rollback of the 2nd tx branch which was called in the mean time:
2024-01-12 18:01:17,374 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Periodic Recovery) RemotingConnectionID=6e6c2e31-b174-11ee-a507-fa163e427ff2 Sending blocking SessionXARollbackMessage[type=56, channelID=10, responseAsync=f alse, requiresResponse=false, correlationID=-1, xid=< 131072, 32, 36, 0000000000-1-11270011831-106-15101-951268000709869108106, 0000000000-1-11270011831-106-15101-9512680007400000000 >] 2024-01-12 18:01:17,374 DEBUG [org.apache.activemq.artemis.service.extensions.xa.recovery] (Periodic Recovery) AMQ172016: Error in XA Recovery: javax.transaction.xa.XAException at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1890) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:109) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382) 2024-01-12 18:01:17,374 WARN [com.arjuna.ats.jtax] (Periodic Recovery) ARJUNA024023: XAResourceRecord.rollback caused an XA error: XAException.XA_RETRY from resource org.jboss.activemq.artemis.wildfly.integration.WildFlyActiveMQXAResourceWrapper@4c3fc513 in transaction < 131072, 32, 36, 0000000000-1-11270011831-106-15101-951268000709869108106, 0000000000-1-11270011831-106-15101-9512680007400000000 >: javax.transaction.xa.XAException at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1890) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:109) at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
Then during the next recovery scan the 2nd tx branch is found but rollback is not called. What happens is that arjuna logs:
2024-01-12 18:01:17,375 INFO [com.arjuna.ats.arjuna] (Periodic Recovery) ARJUNA012269: UNKNOWN state for object with uid 0:ffff7f000001:121f96f1:65a17e08:4b , type /CosTransactions/XAResourceRecord 2024-01-12 18:01:17,375 WARN [com.arjuna.ats.jtax] (Periodic Recovery) ARJUNA024016: Recovery of resource failed when trying to call rollback got exception: org.omg.CosTransactions.HeuristicHazard: IDL:omg.org/CosTransactions/HeuristicHazard:1.0 at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:432) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449) at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605) at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816) at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
and even though next recovery scans find this TX there is vote ABSTAIN in XAResourceOrphanFilter and no longer try to roll it back.
Trace logs attached - see logs.zip
I see 3 issues/questions here:
- Why there are started 2 TX branches for inbound message to MDB?
- Why the connection is closed for XAException when rollback fails?
- Why next recovery scans do not recover the 2nd TX branch.
- clones
-
JBEAP-26435 (7.4.z) JTS - MDB inbound tx branch not rolled back
- Ready for QA
- is cloned by
-
ENTMQBR-9207 AMQ 7.10 - JTS - MDB inbound tx branch not rolled back
- Refinement
-
ENTMQBR-9206 AMQ 7.8 - JTS - MDB inbound tx branch not rolled back
- Dev Complete