-
Bug
-
Resolution: Done
-
Critical
-
7.0.0.ER5, 7.0.0.ER6
Test scenario:
1. Start live server with replicated journal and queue testQueue0
2. Send 500 large messages to testQueue0 t live
3. Start backup server and receiving messages from testQueue0 (session CLIENT_ACKNOWLEDGE)
4. Before backup is announced/synchronized with live, cleanly shutdown backup
5. Wait until receiver consumes all messages
Expected result:
Receiver consumed 500 messages. No losses or duplicates.
Actual result:
There are lost messages. Client did not receive all messages. Messages are not in the journal of live server after the test.
By tracking message Id of the lost message, the message was send to receiver. Because it's large message, receiver tries to ack the message before session.commit() is called. It seems to be some kind of pre-ack. This ack is send to live server which is trying to replicate it to backup. But backup is already shutdown (step 4) and live waits cluster-connection call timeout (30s) before it gives up. After 30s it stores this ack to live's journal and respond to client. Problem is that client timed out on its call-timeout (30s) before response was received by client and client gets JMSException like:
16:26:12,983 Thread-27 ERROR [org.jboss.qa.hornetq.apps.clients.ReceiverClientAck:341] RETRY receive for host: 127.0.0.1, Trying to receive message with count: 57 javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41 at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350) at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421) at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696) at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035) at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702) at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96) at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38) at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212) at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119) at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.receiveMessage(ReceiverClientAck.java:333) at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:169) Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41] ... 11 more
Problem for the client is that message was acked on live server and thus it will be never redelivered to consumer again. So from consumer point of view the message got lost.
In described scenario there are lost messages when admin will just start and cleanly shutdown backup server. Nothing caused crash on live server.
Customer impact: If backup server is shutdown before synchronization with live is complete then If client consumes large message then calling receive() on consumer might timeout on client side but message is acked on live server and marked as delivered. From client pov this message is lost.
- is blocked by
-
WFLY-6847 Lost large messages if backup is shutdown during synchronization
- Closed
- is cloned by
-
JBEAP-3419 (7.0.z) Lost large messages if backup is shutdown during synchronization
- Verified
- is incorporated by
-
JBEAP-5256 (7.1.0) Upgrade Artemis from 1.1.0.SP17 to 1.1.0.SP18
- Verified