Customer Impact: If cluster topology is used, some messages may be lost
Scenario:
- Cluster A with node 1 and 3 is started (JMS servers).
- Cluster B with node 2 and 4 is started (MDB servers).
- Producer sends 20000 messages (mix of small and large messages) to InQueue on cluster A. * Once producer finishes, deploy MDBs (nodes 2, 4) which read messages from InQueue (nodes 1,3) and sends them in XA transaction to OutQueue (nodes 1,3).
- Node 1 and Node 3 are killed on transaction prepare.
- Read messages from OutQueue.
Expectation: Receiver receives all messages sent by producer.
Reality: Receiver doesn't receive all messages sent by producer. Some of them are lost.
In attached logs, message with id d50ac98a-1f78-11e7-bcf4-001b217d6db1 is lost approximately 10 seconds before fist kill done in test. The lost message is large and paged.
Message is sent to node-1. MDB on node-2 reads it from InQueue and according to logs, onMessage method finishes
08:18:20,284 TRACE [org.apache.activemq.artemis.ra] (Thread-8 (ActiveMQ-client-global-threads-952438696)) finished onMessage on ClientLargeMessageImpl[messageID=2432, durable=true, address=jms.queue.InQueue,userID=d50ac98a-1f78-11e7-bcf4-001b217d6db1,properties=TypedProperties[__AMQ_CID=ca7919cb-1f78-11e7-bcf4-001b217d6db1,count=636,counter=637,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888,_AMQ_LARGE_SIZE=204805,color=GREEN]]
These are the latest mentions of this mesage in trace logs
08:18:19,890 TRACE [org.apache.activemq.artemis.core.server.impl.ServerSessionImpl] (default I/O-7) send(message=ServerMessage[messageID=74089,durable=true,userID=1d76def3-1f7a-11e7-b64f-001b217d6db1,priority=4, bodySize=584, timestamp=Wed Apr 12 08:18:19 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:d50ac98a-1f78-11e7-bcf4-001b217d6db1,__AMQ_CID=1176a9f4-1f7a-11e7-b64f-001b217d6db1,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888]]@1947073419, direct=true) being called 08:18:19,890 TRACE [org.apache.activemq.artemis.core.postoffice.impl.BindingsImpl] (default I/O-7) Routing message ServerMessage[messageID=74089,durable=true,userID=1d76def3-1f7a-11e7-b64f-001b217d6db1,priority=4, bodySize=584, timestamp=Wed Apr 12 08:18:19 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:d50ac98a-1f78-11e7-bcf4-001b217d6db1,__AMQ_CID=1176a9f4-1f7a-11e7-b64f-001b217d6db1,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888]]@1947073419 on binding=BindingsImpl [name=jms.queue.OutQueue] 08:18:19,890 TRACE [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] (default I/O-7) Message after routed=ServerMessage[messageID=74089,durable=true,userID=1d76def3-1f7a-11e7-b64f-001b217d6db1,priority=4, bodySize=584, timestamp=Wed Apr 12 08:18:19 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:d50ac98a-1f78-11e7-bcf4-001b217d6db1,__AMQ_CID=1176a9f4-1f7a-11e7-b64f-001b217d6db1,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888]]@1947073419 08:18:19,890 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-6 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@3e511390-1508940791)) appendAddRecordTransactional:txID=74088,id=74089, userRecordType=31, record = ServerMessage[messageID=74089,durable=true,userID=1d76def3-1f7a-11e7-b64f-001b217d6db1,priority=4, bodySize=584, timestamp=Wed Apr 12 08:18:19 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[inMessageId=ID:d50ac98a-1f78-11e7-bcf4-001b217d6db1,__AMQ_CID=1176a9f4-1f7a-11e7-b64f-001b217d6db1,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888]]@1947073419, usedFile = JournalFileImpl: (activemq-data-1.amq id = 49, recordID = 49) 08:18:21,141 TRACE [org.apache.activemq.artemis.core.paging.impl.Page] (default I/O-15) Reading message PagedMessageImpl [queueIDs=[30], transactionID=2307, message=LargeServerMessage[messageID=2432,durable=true,userID=d50ac98a-1f78-11e7-bcf4-001b217d6db1,priority=4, timestamp=Wed Apr 12 08:09:08 EDT 2017,expiration=0, durable=true, address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=ca7919cb-1f78-11e7-bcf4-001b217d6db1,count=636,counter=637,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888,_AMQ_LARGE_SIZE=204805,color=GREEN]]@1186420511] on pageId=322 for address=jms.queue.InQueue 08:18:21,356 TRACE [org.apache.activemq.artemis.core.paging.impl.Page] (Thread-21 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@617a1053-926227281)) Reading message PagedMessageImpl [queueIDs=[30], transactionID=2307, message=LargeServerMessage[messageID=2432,durable=true,userID=d50ac98a-1f78-11e7-bcf4-001b217d6db1,priority=4, timestamp=Wed Apr 12 08:09:08 EDT 2017,expiration=0, durable=true, address=jms.queue.InQueue,properties=TypedProperties[__AMQ_CID=ca7919cb-1f78-11e7-bcf4-001b217d6db1,count=636,counter=637,_AMQ_DUPL_ID=5e537488-12e8-4f52-a501-7baa52c217c61491998948888,_AMQ_LARGE_SIZE=204805,color=GREEN]]@1608296841] on pageId=322 for address=jms.queue.InQueue
At the end of the test, this message is not received from OutQueue.
This is regression against 7.0.z. releases.
Issue is under investigation.
See private comment for full trace logs.
- duplicates
-
JBEAP-9937 Lost messages in cluster with server crashes
- Closed
- is incorporated by
-
JBEAP-10454 Upgrade Artemis 1.5.4.jbossorg-003
- Closed
- relates to
-
JBEAP-9937 Lost messages in cluster with server crashes
- Closed
-
ARTEMIS-1114 Loading...