Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-26433

(8.0.z) JTS - MDB inbound tx branch not rolled back

XMLWordPrintable

    • False
    • None
    • False
    • Hide

      Issue is intermittent and will require to run the test multiple times:

      export WORKSPACE=$PWD
      wget https://download.devel.redhat.com/devel/candidates/JBEAP/JBEAP-8.0.0.GA-CR2.1/jboss-eap-8.0.0.GA-CR2.1.zip
      unzip jboss-eap-8.0.0.GA-CR2.1.zip
      git clone --branch EAPQE-791 git@gitlab.hosts.mwqe.eng.bos.redhat.com:mnovak/tests-transactions.git
      cd $WORKSPACE/tests-transactions
      mvn clean verify -Djboss.dist=$WORKSPACE/jboss-eap-8.0 -Dcategory=Jms -Dskip-download-sources -U --batch-mode -Dmaven.test.failure.ignore=true -Dsurefire.test.failure.ignore=true --fail-never -Dts.jbossts1.node.identifier=NTAK -Dts.jbossts2.node.identifier=bdeh -Dts.jbossts3.node.identifier=ABnN -Dts.timeout.factor=150 -Djbossts.noJTA -Dtest=JMSMdbBasicCrashRecoveryTestCase#prepareHalt
      
      Show
      Issue is intermittent and will require to run the test multiple times: export WORKSPACE=$PWD wget https: //download.devel.redhat.com/devel/candidates/JBEAP/JBEAP-8.0.0.GA-CR2.1/jboss-eap-8.0.0.GA-CR2.1.zip unzip jboss-eap-8.0.0.GA-CR2.1.zip git clone --branch EAPQE-791 git@gitlab.hosts.mwqe.eng.bos.redhat.com:mnovak/tests-transactions.git cd $WORKSPACE/tests-transactions mvn clean verify -Djboss.dist=$WORKSPACE/jboss-eap-8.0 -Dcategory=Jms -Dskip-download-sources -U --batch-mode -Dmaven.test.failure.ignore= true -Dsurefire.test.failure.ignore= true --fail-never -Dts.jbossts1.node.identifier=NTAK -Dts.jbossts2.node.identifier=bdeh -Dts.jbossts3.node.identifier=ABnN -Dts.timeout.factor=150 -Djbossts.noJTA -Dtest=JMSMdbBasicCrashRecoveryTestCase#prepareHalt

      There is intermittent failure in the following crash recovery scenarios with JTS and MDB listening on "in-vm" Artemis.

      Test Scenario:

      • Start server with configured JTS and deploy MDB
      • Start XA transaction by sending message into the "inbound" queue for MDB
        • There is also enlisted tx branch with TestXAResource to control moment when server should be crashed. Server is killed in the moment when:
        • Prepare is finished on tx branch with "inbound" message for MDB
        • When prepare is called on tx branch with TestXAResource, server is killed and started again but with MDB undeployed
      • Wait for the recovery of XA transaction.

      Expected result: Rollback is expected so message gets back into the "inbound" queue.
      Actual result: Transaction is not rolled back during recovery.

      Customer impact: In case that server with deployed MDB processing message in JTS transaction crashes then it may happen that transaction is not recovered and is "stuck" in the queue.

      Investigation:
      There is started one XA transaction with 3 tx branches. There are 2 tx branches for receiving the message by MDB which is weird but it might be a feature of JTS. During recovery only one is rolled back but when calling rollback for the 2nd branch then exception XAException.XA_RETRY is thrown from Artemis. By investigation of trace logs, this is what happens.

      There is started one XA transaction with 2 tx branches for receiving the message to MDB on inbound connection. However just one is "holding" the message.
      Then during recovery only one is rolled back and the 2nd one is not. From trace logs there is called rollback for the 1st tx branch twice in a row. What happens is that the first attempt succeeds and second one fails (as it was already rolled back). Problem is that this 2nd attempt for rollback throws XAException:

      2024-01-12 18:01:17,361 DEBUG [org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@1c5a52f1)) Sending exception to client: org.apache.activemq.artemis.core.exception.ActiveMQXAException: Cannot find xid in resource manager: XidImpl (1318536416 bq:0.0.0.0.0.0.0.0.0.0.-1.-1.127.0.0.1.18.31.-106.-15.101.-95.126.8.0.0.0.87.0.0.0.0.0.0.0.0 formatID:131072 gtxid:0.0.0.0.0.0.0.0.0.0.-1.-1.127.0.0.1.18.31.-106.-15.101.-95.126.8.0.0.0.70.98.69.108.106 base64:AAAAAAAAAAAAAP__fwAAARIflvFloX4IAAAAVwAAAAAAAAAAAAAAAAAAAAAAAP__fwAAARIflvFloX4IAAAARmJFbGoCAgIA
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.xaRollback(ServerSessionImpl.java:1563)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:522)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:298)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:67)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:56)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:67)
              at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
              at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
              at org.apache.activemq.artemis.journal//org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118
      

      which closes the underlying in-vm connection 6e6e02f3-b174-11ee-a507-fa163e427ff2 to broker:

      2024-01-12 18:01:17,370 DEBUG [org.apache.activemq.artemis.core.client.impl.ClientSessionImpl] (Thread-107) calling cleanup on ClientSessionImpl [name=6e6e02f3-b174-11ee-a507-fa163e427ff2, username=null, closed=false, factory = org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl@5f28e125, metaData=()]@77377553
      2024-01-12 18:01:17,373 TRACE [org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl] (Thread-107) Connection removed 6e6c2e31-b174-11ee-a507-fa163e427ff2 from server ActiveMQServerImpl::name=default: java.lang.Exception: trace
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl.connectionDestroyed(RemotingServiceImpl.java:590)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMAcceptor$Listener.connectionDestroyed(InVMAcceptor.java:298)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection.close(InVMConnection.java:148)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMAcceptor.disconnect(InVMAcceptor.java:248)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnector$Listener.connectionDestroyed(InVMConnector.java:272)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.remoting.impl.invm.InVMConnection.close(InVMConnection.java:148)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.internalClose(RemotingConnectionImpl.java:441)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.destroy(RemotingConnectionImpl.java:243)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.checkCloseConnection(ClientSessionFactoryImpl.java:961)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.closeCleanSessions(ClientSessionFactoryImpl.java:500)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.interruptConnectAndCloseAllSessions(ClientSessionFactoryImpl.java:475)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.close(ClientSessionFactoryImpl.java:508)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.close(ActiveMQXAResourceWrapper.java:393)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.check(ActiveMQXAResourceWrapper.java:420)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:111)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372)
      

      which causes fail of rollback of the 2nd tx branch which was called in the mean time:

      2024-01-12 18:01:17,374 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Periodic Recovery) RemotingConnectionID=6e6c2e31-b174-11ee-a507-fa163e427ff2 Sending blocking SessionXARollbackMessage[type=56, channelID=10, responseAsync=f
      alse, requiresResponse=false, correlationID=-1, xid=< 131072, 32, 36, 0000000000-1-11270011831-106-15101-951268000709869108106, 0000000000-1-11270011831-106-15101-9512680007400000000 >]
      2024-01-12 18:01:17,374 DEBUG [org.apache.activemq.artemis.service.extensions.xa.recovery] (Periodic Recovery) AMQ172016: Error in XA Recovery: javax.transaction.xa.XAException
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1890)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:109)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
      
      2024-01-12 18:01:17,374 WARN  [com.arjuna.ats.jtax] (Periodic Recovery) ARJUNA024023: XAResourceRecord.rollback caused an XA error: XAException.XA_RETRY from resource org.jboss.activemq.artemis.wildfly.integration.WildFlyActiveMQXAResourceWrapper@4c3fc513 in transaction < 131072, 32, 36, 0000000000-1-11270011831-106-15101-951268000709869108106, 0000000000-1-11270011831-106-15101-9512680007400000000 >: javax.transaction.xa.XAException
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.rollback(ClientSessionImpl.java:1890)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.recovery.ActiveMQXAResourceWrapper.rollback(ActiveMQXAResourceWrapper.java:109)
              at org.apache.activemq.artemis@2.21.0.redhat-00045//org.apache.activemq.artemis.service.extensions.xa.ActiveMQXAResourceWrapperImpl.rollback(ActiveMQXAResourceWrapperImpl.java:111)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:372)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
      

      Then during the next recovery scan the 2nd tx branch is found but rollback is not called. What happens is that arjuna logs:

      2024-01-12 18:01:17,375 INFO  [com.arjuna.ats.arjuna] (Periodic Recovery) ARJUNA012269: UNKNOWN state for object with uid 0:ffff7f000001:121f96f1:65a17e08:4b , type /CosTransactions/XAResourceRecord
      2024-01-12 18:01:17,375 WARN  [com.arjuna.ats.jtax] (Periodic Recovery) ARJUNA024016: Recovery of resource failed when trying to call rollback got exception: org.omg.CosTransactions.HeuristicHazard: IDL:omg.org/CosTransactions/HeuristicHazard:1.0
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.rollback(XAResourceRecord.java:432)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.doRecovery(XAResourceRecord.java:1449)
              at org.jboss.jts//com.arjuna.ats.internal.jta.resources.jts.orbspecific.XAResourceRecord.recover(XAResourceRecord.java:1273)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.jts.XARecoveryResourceImple.recover(XARecoveryResourceImple.java:88)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.xaRecoverySecondPass(XARecoveryModule.java:896)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.bottomUpRecovery(XARecoveryModule.java:605)
              at org.jboss.jts//com.arjuna.ats.internal.jta.recovery.arjunacore.XARecoveryModule.periodicWorkSecondPass(XARecoveryModule.java:295)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.doWorkInternal(PeriodicRecovery.java:816)
              at org.jboss.jts//com.arjuna.ats.internal.arjuna.recovery.PeriodicRecovery.run(PeriodicRecovery.java:382)
       

      and even though next recovery scans find this TX there is vote ABSTAIN in XAResourceOrphanFilter and no longer try to roll it back.

      Trace logs attached - see logs.zip 

      I see 3 issues/questions here:

      • Why there are started 2 TX branches for inbound message to MDB?
      • Why the connection is closed for XAException when rollback fails?
      • Why next recovery scans do not recover the 2nd TX branch.  

              Unassigned Unassigned
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: