We have lost messages in one of our HA test scenario.
Test scenario:
Two servers are configured in replicated colocated topology.
Clients (producer and subscriber) are connected to server1 sending and receiving messages from topic.
Server1 is killed and failover is executed. Then server1 is started again and clients make failback back and continue sending and receiving messages.
Result:
In the end clients are stopped and sent messages are compared with received messages.
In our case one message is missing on subscriber probably due to exception [1]. On server side are warnings [2] and [3] (which I'm not sure if are related with this issue)
Comment in PagecursorProviderImpl on line 113 says: "// sanity check, this ([3]) should never happen unless there's a bug"
According to traces the lost message is large and is the last but one message written to the journal on last page .
Expected result: No message loss or duplicates. We also do not expect exceptions [1][3].
In attachment are traces of both nodes configs and complete log from test.
Similar issue was hit in Artemis test suite. In the test
PagingTest#testDeleteQueueRestart consumer get exception [4] when he
tries to commit received messages. On server side warning messages [5]
appear. I execute the test on matrix job with 9 combinations and it
always fail in one or two cases.
[1]
10:39:33,727 Thread-29 ERROR [org.jboss.qa.hornetq.apps.clients.SubscriberTransAck:439] RETRY receive for host: 127.0.0.1, Trying to receive message with counter: 1081 javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41 at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696) at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035) at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702) at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96) at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38) at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212) at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119) at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.receiveMessage(SubscriberTransAck.java:427) at org.jboss.qa.hornetq.apps.clients.SubscriberTransAck.run(SubscriberTransAck.java:139) Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41]
[2]
WARN [org.apache.activemq.artemis.core.server] (Thread-6 (ActiveMQ-client-netty-threads-1223857560)) AMQ222086: error handling packet ReplicationCommitMessage[type=96, channel=2, journalID=1, txAction='rollback'] for replication: ActiveMQUnsupportedPacketException[errorType=UNSUPPORTED_PACKET message=null] at org.apache.activemq.artemis.core.journal.impl.FileWrapperJournal.appendRollbackRecord(FileWrapperJournal.java:215) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.journal.impl.JournalBase.appendRollbackRecord(JournalBase.java:172) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.journal.impl.FileWrapperJournal.appendRollbackRecord(FileWrapperJournal.java:51) [artemis-journal-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handleCommitRollback(ReplicationEndpoint.java:609) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.handlePacket(ReplicationEndpoint.java:182) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.handlePacket(ChannelImpl.java:567) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.doBufferReceived(RemotingConnectionImpl.java:349) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.bufferReceived(RemotingConnectionImpl.java:331) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingBufferHandler.bufferReceived(ClientSessionFactoryImpl.java:1105) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7]
[3]
WARN [org.apache.activemq.artemis.core.server] (Thread-9 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@57e1a31c-1495685008)) Invalid messageNumber passed = PagePositionImpl [pageNr=69, messageNr=0, recordID=2147488813] on null: java.lang.IllegalStateException: Invalid messageNumber passed = PagePositionImpl [pageNr=69, messageNr=0, recordID=2147488813] on null at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:114) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:511) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:64) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getMessage(PagedReferenceImpl.java:52) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.server.impl.RefsOperation.decrementRefCount(RefsOperation.java:163) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterCommit(RefsOperation.java:156) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:461) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$100(TransactionImpl.java:34) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$2.done(TransactionImpl.java:273) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:201) [artemis-server-1.1.0.redhat-7.jar:1.1.0.redhat-7] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100) [artemis-core-client-1.1.0.redhat-7.jar:1.1.0.redhat-7] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_05] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_05] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_05]
[4]
org.apache.activemq.artemis.api.core.ActiveMQConnectionTimedOutException: AMQ119014:
Timed out after waiting 30,000 ms for response when sending packet 43
at
org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:368)
at
org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.simpleCommit(ActiveMQSessionContext.java:301)
at
org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.commit(ClientSessionImpl.java:509)
at
org.apache.activemq.artemis.tests.integration.client.PagingTest.testDeleteQueueRestart(PagingTest.java:870)
[5]
06:01:20,799 WARN [org.apache.activemq.artemis.core.server] Invalid messageNumber passed = PagePositionImpl [pageNr=344, messageNr=8, recordID=21496] on null: java.lang.IllegalStateException: Invalid messageNumber passed = PagePositionImpl [pageNr=344, messageNr=8, recordID=21496] on null at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.getMessage(PageCursorProviderImpl.java:117) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl.queryMessage(PageSubscriptionImpl.java:532) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getPagedMessage(PagedReferenceImpl.java:66) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.paging.cursor.PagedReferenceImpl.getMessage(PagedReferenceImpl.java:53) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.RefsOperation.decrementRefCount(RefsOperation.java:163) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.RefsOperation.afterCommit(RefsOperation.java:156) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:488) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$100(TransactionImpl.java:34) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$2.done(TransactionImpl.java:287) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:209) [artemis-server-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:100) [artemis-core-client-1.3.0-SNAPSHOT.jar:1.3.0-SNAPSHOT] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_71] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_71]
This issue is still under investigation
Note: hit also with 6.4.0.DR11, severity high - no customer case until today, thus not marked as blocker.
Customer impact: Customer may loose important data which was contained inside the message.
- clones
-
JBEAP-5255 (7.1.0) Lost message when using transaction session on subscriber/consumer.
- Verified
- is incorporated by
-
JBEAP-4679 (7.0.z) Upgrade Artemis from 1.1.0.SP17 to 1.1.0.SP18
- Verified
- is related to
-
JBEAP-3403 [Artemis Testsuite] ConcurrentCreateDeleteProduceTest#testConcurrentProduceCreateAndDelete fails
- Verified