Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-5258

(7.1.0) Lost large messages if backup is shutdown during synchronization

XMLWordPrintable

    • Hide

      How to run the test locally:

      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout refactoring_modules
      groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      cd ../jboss-hornetq-testsuite/
      mvn clean test -Dtest=ReplicatedDedicatedFailoverTestWithMdb#testJustFailbackWithLargeMessages  -DfailIfNoTests=false  -Deap=7x  | tee log
      
      Show
      How to run the test locally: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout refactoring_modules groovy -DEAP_VERSION=7.0.0.ER5 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestWithMdb#testJustFailbackWithLargeMessages -DfailIfNoTests= false -Deap=7x | tee log

      Test scenario:
      1. Start live server with replicated journal and queue testQueue0
      2. Send 500 large messages to testQueue0 t live
      3. Start backup server and receiving messages from testQueue0 (session CLIENT_ACKNOWLEDGE)
      4. Before backup is announced/synchronized with live, cleanly shutdown backup
      5. Wait until receiver consumes all messages

      Expected result:
      Receiver consumed 500 messages. No losses or duplicates.

      Actual result:
      There are lost messages. Client did not receive all messages. Messages are not in the journal of live server after the test.

      By tracking message Id of the lost message, the message was send to receiver. Because it's large message, receiver tries to ack the message before session.commit() is called. It seems to be some kind of pre-ack. This ack is send to live server which is trying to replicate it to backup. But backup is already shutdown (step 4) and live waits cluster-connection call timeout (30s) before it gives up. After 30s it stores this ack to live's journal and respond to client. Problem is that client timed out on its call-timeout (30s) before response was received by client and client gets JMSException like:

      16:26:12,983 Thread-27 ERROR [org.jboss.qa.hornetq.apps.clients.ReceiverClientAck:341] RETRY receive for host: 127.0.0.1, Trying to receive message with count: 57
      javax.jms.JMSException: AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41
      	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.sendBlocking(ChannelImpl.java:350)
      	at org.apache.activemq.artemis.core.protocol.core.impl.ActiveMQSessionContext.sendACK(ActiveMQSessionContext.java:421)
      	at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.acknowledge(ClientSessionImpl.java:696)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.doAck(ClientConsumerImpl.java:1035)
      	at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.acknowledge(ClientConsumerImpl.java:702)
      	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:96)
      	at org.apache.activemq.artemis.core.client.impl.ClientMessageImpl.acknowledge(ClientMessageImpl.java:38)
      	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:212)
      	at org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:119)
      	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.receiveMessage(ReceiverClientAck.java:333)
      	at org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:169)
      Caused by: ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119014: Timed out after waiting 30,000 ms for response when sending packet 41]
      	... 11 more
      

      Problem for the client is that message was acked on live server and thus it will be never redelivered to consumer again. So from consumer point of view the message got lost.
      In described scenario there are lost messages when admin will just start and cleanly shutdown backup server. Nothing caused crash on live server.

      Customer impact: If backup server is shutdown before synchronization with live is complete then If client consumes large message then calling receive() on consumer might timeout on client side but message is acked on live server and marked as delivered. From client pov this message is lost.

              rh-ee-ataylor Andy Taylor
              rhn-cservice-bbaranow Bartosz Baranowski
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: