Loading...

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: 7.0.2.CR1, 7.0.2.GA
Affects Version/s: 7.0.0.ER5, 7.0.0.ER6
Component/s: ActiveMQ
Labels:
None

Bugzilla References:
https://bugzilla.redhat.com/show_bug.cgi?id=1310537
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.0.z.GA
Steps to Reproduce:
Hide

How to run the test locally:

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestWithMdb#testJustFailbackWithLargeMessages -DfailIfNoTests=false -Deap=7x | tee log
Show
How to run the test locally: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestWithMdb#testJustFailbackWithLargeMessages -DfailIfNoTests= false -Deap=7x | tee log

Sprint:
EAP 7.0.2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Test scenario:
1. Start live server with replicated journal and queue testQueue0
2. Send 500 large messages to testQueue0 t live
3. Start backup server and receiving messages from testQueue0 (session CLIENT_ACKNOWLEDGE)
4. Before backup is announced/synchronized with live, cleanly shutdown backup
5. Wait until receiver consumes all messages

Expected result:
Receiver consumed 500 messages. No losses or duplicates.

Actual result:
There are lost messages. Client did not receive all messages. Messages are not in the journal of live server after the test.

By tracking message Id of the lost message, the message was send to receiver. Because it's large message, receiver tries to ack the message before session.commit() is called. It seems to be some kind of pre-ack. This ack is send to live server which is trying to replicate it to backup. But backup is already shutdown (step 4) and live waits cluster-connection call timeout (30s) before it gives up. After 30s it stores this ack to live's journal and respond to client. Problem is that client timed out on its call-timeout (30s) before response was received by client and client gets JMSException like:

16:26:12,983 Thread-27 ERROR [org.jboss.qa.hornetq.apps.clients.ReceiverClientAck:341] RETRY receive for host: 127.0.0.1, Trying to receive message with count: 57
javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41
	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350)
	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421)
	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696)
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035)
	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702)
	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96)
	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38)
	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212)
	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119)
	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.receiveMessage(ReceiverClientAck.java:333)
	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:169)
Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41]
	... 11 more

Problem for the client is that message was acked on live server and thus it will be never redelivered to consumer again. So from consumer point of view the message got lost.
In described scenario there are lost messages when admin will just start and cleanly shutdown backup server. Nothing caused crash on live server.

Customer impact: If backup server is shutdown before synchronization with live is complete then If client consumes large message then calling receive() on consumer might timeout on client side but message is acked on live server and marked as delivered. From client pov this message is lost.

clones

JBEAP-5258 (7.1.0) Lost large messages if backup is shutdown during synchronization

Verified

is incorporated by

JBEAP-4679 (7.0.z) Upgrade Artemis from 1.1.0.SP17 to 1.1.0.SP18

Verified

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates