Scenario:
- There are 4 EAP servers.
- Servers 1, 3 are in cluster configured using Netty static connectors. They hold messages and they will be referenced in this text as JMS servers.
- Servers 2, 4 are not in cluster. They have RA configured to connect to servers 1 and 3 server using Netty static connectors. They have MDBs which resend messages from InQueue to OutQueue. Servers 2, 4 will be referenced this text as MDB servers.
- At the beginning of the test, standalone producer sends 10 000 messages to server 1.
- MDBs are deployed to MDB servers.
- JMS servers are restarted.
- Test waits until all messages are resent from InQueue to OutQueue
- Standalone receiver receives all messages from OutQueue on server 1.
Expectation: All messages sent to InQueue are received from OutQueue.
Reality: Sometimes happens that few messages are stuck on server 1 in InQueue. See details below.
Customer impact: It may happen that some messages are not properly delivered in time and restart of server is needed for rescheduling of delivery the messages.
This issue is not reported as blocker because the messages are not lost and this is intermittent issue occurring in edge case when events in cluster happens in specific order. This issue is present also in previous versions of EAP.
Detail description of issue
Note: Since Artemis doesn't log sufficiently in code where issue occurs, I added more log messages to be able to track what happened with the messages. The changes are attached in 0001-Added-logging.patch
I investigated message with _AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750.
In trace logs on server 1, I can see that the message was sent to MDB server and it is in delivering state. When I tracked down the message on MDB server, I found these log messages.
05:23:59,696 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl] (Thread-22 (ActiveMQ-client-netty-threads)) handling packet PACKET(SessionReceiveClientLargeMessage)[type=76, channelID=10, packetObject=SessionReceiveClientLargeMessage, consumerID=0, deliveryCount=2, largeMessageSize=102400, message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]]] 05:23:59,703 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Adding message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]] to buffer. 05:23:59,704 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Checking handler=org.apache.activemq.artemis.ra.inflow.ActiveMQMessageHandler@1fdf07cd for message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]] 05:23:59,704 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Executor was not queued because consumer is stopped. Message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]]
As you can see the last log message is Executor was not queued because consumer is stopped.
Stopping of the consumer was likely caused by ClearAtFailover event.
ntext=ActiveMQConsumerContext{id=0}, queueName=jms.queue.InQueue}::ClearAtFailover: java.lang.Throwable: trace at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.clearAtFailover(ClientConsumerImpl.java:507) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:946) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:771) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:614) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:504) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$500(ClientSessionFactoryImpl.java:72) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1165) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.j bossorg-008] at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:67) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:208) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.fail(AbstractRemotingConnection.java:210) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:996) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_71] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_71] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_71]
This issue likely happens when both JMS servers are restarted at once. The MDB server receives Disconnect packet and triggers connection failure listeners. Since MDB creates by default 15 sessions, executing of all connection failure procedures may take a while. In the meantime JMS servers are restarted and they send messages to MDB server. As it was mentioned previously, these messages are thrown away by ClearAtFailover event which was triggered by connection failure which happened few seconds ago.
JMS server, which sent the messages, is waiting on acknowledgments that the messages were received by MDB server. However this never happens.
- relates to
-
ENTMQBR-8173 EAP 8 - Undelivered messages after restart in cluster
- New
-
JBEAP-23608 (8.0.z) Messaging - Number of consumers should be almost equal on server restart
- New
-
JBEAP-25103 EAP 8 - Undelivered messages after restart in cluster
- New