Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-12471

Lost messages during redistribution over Artemis core bridge

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 7.1.0.CR1
    • 7.1.0.ER3
    • ActiveMQ
    • None
    • Regression
    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout 20283f4076ee2245570f9781c4dbc47e322792c5
      groovy -DEAP_VERSION=7.1.0.ER3 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ClusterTestCase#clusterTestWithKills -DfailIfNoTests=false -Deap=7x | tee log
      or 
      mvn clean test -Dtest=DedicatedFailoverCoreBridges#testFailoverKillWithBridgeWithOneStaticNIOConnector -DfailIfNoTests=false -Deap=7x | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout 20283f4076ee2245570f9781c4dbc47e322792c5 groovy -DEAP_VERSION=7.1.0.ER3 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ClusterTestCase#clusterTestWithKills -DfailIfNoTests= false -Deap=7x | tee log or mvn clean test -Dtest=DedicatedFailoverCoreBridges#testFailoverKillWithBridgeWithOneStaticNIOConnector -DfailIfNoTests= false -Deap=7x | tee log

      There is regression against EAP 7.1.0.ER2 and EAP 7.0.

      Customer story
      There can be huge message loss when core bridge redistribution is used. Messages get lost in Artemis cluster. This is severe issue which does not allow use of clustering in production.

      Topology

      • node1 and node2 - dedicated live-backup pair
      • node3 - with core bridge resending messages from InQueue on node3 to OutQueue on node1

      Some messages are not delivered to queue on node1/node2.
      Trace logs of bridge server node3 looks like the message is successfully distributed over the bridge and acked.

      11:50:16,182 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-0 (ActiveMQ-client-netty-threads)) BridgeImpl::sendAcknowledged received confirmation for message LargeServerMessage[messageID=601,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1197599459
      11:50:16,182 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-0 (ActiveMQ-client-netty-threads)) BridgeImpl::sendAcknowledged bridge BridgeImpl@3567a236 [name=my-bridge, queue=QueueImpl[name=jms.queue.InQueue, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=ded6b129-7607-11e7-b35d-001b217d6d57]]@11f0ea44 targetConnector=ServerLocatorImpl (identity=Bridge my-bridge) [initialConnectors=[TransportConfiguration(name=core-bridge-connector-0, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&localAddress=127-0-0-1&useNio=true&host=127-0-0-1&useNioGlobalWorkerPool=true], discoveryGroupConfiguration=null]] Acking Reference[601]:RELIABLE:LargeServerMessage[messageID=601,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1197599459 on queue QueueImpl[name=jms.queue.InQueue, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=ded6b129-7607-11e7-b35d-001b217d6d57]]@11f0ea44
      

      However, searching for AMQ_DUPL_ID property of given message shows single trace log entry on node1

      11:50:16,146 TRACE [org.apache.activemq.artemis.core.server.impl.ServerSessionImpl] (Thread-11 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@7d0731a8)) sendLarge::LargeServerMessage[messageID=665,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1299746455
      

      As a result, the message is missing on target server's queue.

      Notes

      • This is intermittent fail
      • Reproducer test is focused on testing failover, however message loss happens before live node1 is killed

      Issue is under investigation

              csuconic@redhat.com Clebert Suconic
              mstyk_jira Martin Styk (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: