Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-31834

(8.1.x) Lost message after redistribution of a message in the Artemis cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • 8.1 Update 3
    • JMS
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Steps to reproduce (issue is very intermittent):

      git clone git@gitlab.cee.redhat.com:jbossqe-eap/messaging-testsuite.git messaging-testsuite
      cd messaging-testsuite/scripts/
      
      groovy -DEAP_ZIP_URL=<path_to_server_zip_file> PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      mvn clean test -Dsurefire.failIfNoSpecifiedTests=false -Dtest=Lodh5DoubleSendToDbTestCase#testOracleKillJmsNormalMessages -Dversion.artemis=2.44.0 -Djdbc.drivers.download.url=http://www.qa.jboss.com/jdbc-drivers-products/EAP/8.1.0/ -Deap7.clients.version=8.1768226589-8.x-8878-202601120201-SNAPSHOT -Deap7.org.jboss.qa.hornetq.apps.clients.version=8.1768226589-8.x-8878-202601120201-SNAPSHOT
      
      Show
      Steps to reproduce (issue is very intermittent): git clone git@gitlab.cee.redhat.com:jbossqe-eap/messaging-testsuite.git messaging-testsuite cd messaging-testsuite/scripts/ groovy -DEAP_ZIP_URL=<path_to_server_zip_file> PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dsurefire.failIfNoSpecifiedTests= false -Dtest=Lodh5DoubleSendToDbTestCase#testOracleKillJmsNormalMessages -Dversion.artemis=2.44.0 -Djdbc.drivers.download.url=http: //www.qa.jboss.com/jdbc-drivers-products/EAP/8.1.0/ -Deap7.clients.version=8.1768226589-8.x-8878-202601120201-SNAPSHOT -Deap7.org.jboss.qa.hornetq.apps.clients.version=8.1768226589-8.x-8878-202601120201-SNAPSHOT

      There is lost message in the following crash failure scenario: 

      • Start cluster A of nodes node-1, node-3
      • Start cluster B of nodes node-2, node-4
      • Send messages to InQueue to cluster A (node-1 and node-3)
      • Deploy MDBs to servers in cluster A. This MDB reads messages from local InQueue and for each message sends a message to remote InQueue on cluster B and inserts a row into database (in XA transaction).
      • Deploy MDBs to servers in cluster B. This mdb reads messages from local InQueue and for each message inserts a row into database (there is different schema/table from cluster A but the same database)
      • During the processing of messages kill server node-1
      • Restart server node-1 and wait untill all messages are processed on both of the clusters

      Expected result: All messages sent by producer are inserted into database confirming that no message was lost.
      Actual result: There is missing record/row for one of the messages in the database.

      Impact: In case of crash of server in cluster there might be lost messages.

      Investigation: Based on trace logs the ID of the lost messages is: a58e9db1-efc8-11f0-9adf-fa163e3da523. Interesting thing is that this message was actually never processed by any MDB or part of XA transaction. It was sent to InQueue to node-3 and then a while later it was redistributed to node-1 where node-1 acks this message on core bridge:

      node-1-log/server-trace.log:15:09:47,636 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Thread-10 (activemq-default)) RemotingConnectionID=6fce336a ChannelImpl::confirming packet SessionSendMessage_V3[type=71, channelID=10, responseAsync=true, requiresResponse=true, correlationID=271, message=CoreMessage[messageID=5985, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=700, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@1695530679, requiresResponse=true, correlationID=271, requiresResponse=true] last commandID=271
      

      and node-3 receives this ack:

      node-3-log/server-trace.log:15:09:47,789 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-20 (activemq-client-global)) BridgeImpl::sendAcknowledged bridge ClusterConnectionBridge@78bf1e76 [name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@78bf1e76 [name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=rhos-d-rhel9-xlarge-645671], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@44014442[nodeUUID=6d107cf6-efc8-11f0-b28a-fa163e3da523, connector=TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=10080&host=rhos-d-rhel9-xlarge-645671, address=jms, server=ActiveMQServerImpl::name=default])) [initialConnectors=[TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=rhos-d-rhel9-xlarge-645671], discoveryGroupConfiguration=null]] Acking PagedReferenceImpl [message=PagedMessageImpl [queueIDs=[121], transactionID=3291, page=60, message=CoreMessage[messageID=3290, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=881, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, _AMQ_ROUTE_TO$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523=[0000 0000 0000 000F], bytesAsLongs[15], count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@488298360], deliveryTime=0, persistedCount=0, deliveryCount=0, subscription=PageSubscriptionImpl [cursorId=121, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c, filter = null]] on queue QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c
      

      so node-3 considers this message as delivered. However, it looks like that node-1 might not persist this message as it was killed in that moment.

      Based on:

      node-1-log/server-trace.log:15:09:47,631 TRACE [org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl] (Thread-4 (activemq-paging-default)) Paging message PagedMessageImpl [queueIDs=[15], transactionID=5986, page=632, message=CoreMessage[messageID=5985, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=700, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@1695530679] on pageStore jms.queue.InQueue pageNr=632
      

      it plans to page this message in transactionID=5986 however it's not committed. After restart there is message:

      15:09:54,714 WARN  [org.apache.activemq.artemis.journal] (ServerService Thread Pool -- 86) AMQ142015: Uncommitted transaction with id 5986 found and discarded
      

      so it removed tx together with the message.

              ehugonne1@redhat.com Emmanuel Hugonnet
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: