Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-13228

(7.2.z) Undelivered messages after restart in cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Critical Critical
    • None
    • 7.0.7.GA, 7.1.0.CR2, 7.2.0.GA
    • ActiveMQ
    • None
    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout master
      groovy -DEAP_VERSION=7.1.0.CR2 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=RemoteJcaRebalancingTestCase#testLoadBalancingOfInboundConnectionsToClusterTwoJmsServerStopStart -DfailIfNoTests=false -Deap=7x | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout master groovy -DEAP_VERSION=7.1.0.CR2 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=RemoteJcaRebalancingTestCase#testLoadBalancingOfInboundConnectionsToClusterTwoJmsServerStopStart -DfailIfNoTests= false -Deap=7x | tee log

      Scenario:

      • There are 4 EAP servers.
      • Servers 1, 3 are in cluster configured using Netty static connectors. They hold messages and they will be referenced in this text as JMS servers.
      • Servers 2, 4 are not in cluster. They have RA configured to connect to servers 1 and 3 server using Netty static connectors. They have MDBs which resend messages from InQueue to OutQueue. Servers 2, 4 will be referenced this text as MDB servers.
      • At the beginning of the test, standalone producer sends 10 000 messages to server 1.
      • MDBs are deployed to MDB servers.
      • JMS servers are restarted.
      • Test waits until all messages are resent from InQueue to OutQueue
      • Standalone receiver receives all messages from OutQueue on server 1.

      Expectation: All messages sent to InQueue are received from OutQueue.
      Reality: Sometimes happens that few messages are stuck on server 1 in InQueue. See details below.

      Customer impact: It may happen that some messages are not properly delivered in time and restart of server is needed for rescheduling of delivery the messages.

      This issue is not reported as blocker because the messages are not lost and this is intermittent issue occurring in edge case when events in cluster happens in specific order. This issue is present also in previous versions of EAP.

      Detail description of issue

      Note: Since Artemis doesn't log sufficiently in code where issue occurs, I added more log messages to be able to track what happened with the messages. The changes are attached in 0001-Added-logging.patch

      I investigated message with _AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750.

      In trace logs on server 1, I can see that the message was sent to MDB server and it is in delivering state. When I tracked down the message on MDB server, I found these log messages.

      05:23:59,696 TRACE [org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl] (Thread-22 (ActiveMQ-client-netty-threads)) handling packet PACKET(SessionReceiveClientLargeMessage)[type=76, channelID=10, packetObject=SessionReceiveClientLargeMessage, consumerID=0, deliveryCount=2, largeMessageSize=102400, message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]]]
      05:23:59,703 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Adding message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]] to buffer.
      05:23:59,704 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Checking handler=org.apache.activemq.artemis.ra.inflow.ActiveMQMessageHandler@1fdf07cd for message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]]
      05:23:59,704 TRACE [org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl] (Thread-22 (ActiveMQ-client-netty-threads)) Executor was not queued because consumer is stopped. Message=ClientLargeMessageImpl[messageID=1089, durable=true, address=jms.queue.InQueue,userID=7006ae5f-9f77-11e7-8182-0015177911ca,properties=TypedProperties[__AMQ_CID=6d970d75-9f77-11e7-8182-0015177911ca,count=375,counter=376,_AMQ_DUPL_ID=2fd54d8f-9f9d-4ed4-a008-fe903856358d1506072098750,_AMQ_LARGE_SIZE=102400,color=RED]]
      

      As you can see the last log message is Executor was not queued because consumer is stopped.

      Stopping of the consumer was likely caused by ClearAtFailover event.

      ntext=ActiveMQConsumerContext{id=0}, queueName=jms.queue.InQueue}::ClearAtFailover: java.lang.Throwable: trace
              at org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.clearAtFailover(ClientConsumerImpl.java:507) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionImpl.handleFailover(ClientSessionImpl.java:946) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.reconnectSessions(ClientSessionFactoryImpl.java:771) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.failoverOrReconnect(ClientSessionFactoryImpl.java:614) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.handleConnectionFailure(ClientSessionFactoryImpl.java:504) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl.access$500(ClientSessionFactoryImpl.java:72) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$DelegatingFailureListener.connectionFailed(ClientSessionFactoryImpl.java:1165) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.j
      bossorg-008]
              at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.callFailureListeners(AbstractRemotingConnection.java:67) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.protocol.core.impl.RemotingConnectionImpl.fail(RemotingConnectionImpl.java:208) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.fail(AbstractRemotingConnection.java:210) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.core.client.impl.ClientSessionFactoryImpl$CloseRunnable.run(ClientSessionFactoryImpl.java:996) [artemis-core-client-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_71]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_71]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_71]
      

      This issue likely happens when both JMS servers are restarted at once. The MDB server receives Disconnect packet and triggers connection failure listeners. Since MDB creates by default 15 sessions, executing of all connection failure procedures may take a while. In the meantime JMS servers are restarted and they send messages to MDB server. As it was mentioned previously, these messages are thrown away by ClearAtFailover event which was triggered by connection failure which happened few seconds ago.

      JMS server, which sent the messages, is waiting on acknowledgments that the messages were received by MDB server. However this never happens.

        1. 0001-Added-logging.patch
          4 kB
          Erich Duda
        2. jms-server-trace.log.7z
          10.54 MB
          Erich Duda
        3. mdb-server-trace.log.7z
          3.52 MB
          Erich Duda
        4. test.log
          2.50 MB
          Erich Duda

              mtaylor1@redhat.com Martyn Taylor (Inactive)
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: