-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
39.0.0.Final
-
None
-
-
---
-
---
There is lost message in the following crash failure scenario:
- Start cluster A of nodes node-1, node-3
- Start cluster B of nodes node-2, node-4
- Send messages to InQueue to cluster A (node-1 and node-3)
- Deploy MDBs to servers in cluster A. This MDB reads messages from local InQueue and for each message sends a message to remote InQueue on cluster B and inserts a row into database (in XA transaction).
- Deploy MDBs to servers in cluster B. This mdb reads messages from local InQueue and for each message inserts a row into database (there is different schema/table from cluster A but the same database)
- During the processing of messages kill server node-1
- Restart server node-1 and wait untill all messages are processed on both of the clusters
Expected result: All messages sent by producer are inserted into database confirming that no message was lost.
Actual result: There is missing record/row for one of the messages in the database.
Impact: In case of crash of server in cluster there might be lost messages.
Investigation: Based on trace logs the ID of the lost messages is: a58e9db1-efc8-11f0-9adf-fa163e3da523. Interesting thing is that this message was actually never processed by any MDB or part of XA transaction. It was sent to InQueue to node-3 and then a while later it was redistributed to node-1 where node-1 acks this message on core bridge:
node-1-log/server-trace.log:15:09:47,636 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl] (Thread-10 (activemq-default)) RemotingConnectionID=6fce336a ChannelImpl::confirming packet SessionSendMessage_V3[type=71, channelID=10, responseAsync=true, requiresResponse=true, correlationID=271, message=CoreMessage[messageID=5985, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=700, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@1695530679, requiresResponse=true, correlationID=271, requiresResponse=true] last commandID=271
and node-3 receives this ack:
node-3-log/server-trace.log:15:09:47,789 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-20 (activemq-client-global)) BridgeImpl::sendAcknowledged bridge ClusterConnectionBridge@78bf1e76 [name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c targetConnector=ServerLocatorImpl (identity=(Cluster-connection-bridge::ClusterConnectionBridge@78bf1e76 [name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c targetConnector=ServerLocatorImpl [initialConnectors=[TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=rhos-d-rhel9-xlarge-645671], discoveryGroupConfiguration=null]]::ClusterConnectionImpl@44014442[nodeUUID=6d107cf6-efc8-11f0-b28a-fa163e3da523, connector=TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=10080&host=rhos-d-rhel9-xlarge-645671, address=jms, server=ActiveMQServerImpl::name=default])) [initialConnectors=[TransportConfiguration(name=connector, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory)?httpUpgradeEndpoint=acceptor&activemqServerName=default&httpUpgradeEnabled=true&port=8080&host=rhos-d-rhel9-xlarge-645671], discoveryGroupConfiguration=null]] Acking PagedReferenceImpl [message=PagedMessageImpl [queueIDs=[121], transactionID=3291, page=60, message=CoreMessage[messageID=3290, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=881, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, _AMQ_ROUTE_TO$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523=[0000 0000 0000 000F], bytesAsLongs[15], count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@488298360], deliveryTime=0, persistedCount=0, deliveryCount=0, subscription=PageSubscriptionImpl [cursorId=121, queue=QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c, filter = null]] on queue QueueImpl[name=$.artemis.internal.sf.my-cluster.28277ee1-efc8-11f0-b34f-fa163e3da523, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::name=default], temp=false]@554a8d8c
so node-3 considers this message as delivered. However, it looks like that node-1 might not persist this message as it was killed in that moment.
Based on:
node-1-log/server-trace.log:15:09:47,631 TRACE [org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl] (Thread-4 (activemq-paging-default)) Paging message PagedMessageImpl [queueIDs=[15], transactionID=5986, page=632, message=CoreMessage[messageID=5985, durable=true, userID=a58e9db1-efc8-11f0-9adf-fa163e3da523, priority=4, timestamp=Mon Jan 12 15:09:09 UTC 2026, expiration=0, durable=true, address=jms.queue.InQueue, size=700, properties=TypedProperties[__AMQ_CID=a5366b8f-efc8-11f0-9adf-fa163e3da523, _AMQ_ROUTING_TYPE=1, count=261, _AMQ_DUPL_ID=f388c8df-73bc-4234-95b1-bd5be25b3f741768230549856]]@1695530679] on pageStore jms.queue.InQueue pageNr=632
it plans to page this message in transactionID=5986 however it's not committed. After restart there is message:
15:09:54,714 WARN [org.apache.activemq.artemis.journal] (ServerService Thread Pool -- 86) AMQ142015: Uncommitted transaction with id 5986 found and discarded
so it removed tx together with the message.
- is cloned by
-
JBEAP-31834 (8.1.x) Lost message after redistribution of a message in the Artemis cluster
-
- Open
-