Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-11576

Replicated journal - NPE during backup activation causes that consumer on queue hangs forever

    XMLWordPrintable

Details

    • Bug
    • Resolution: Obsolete
    • Critical
    • 7.2.0.GA.CR1
    • 7.1.0.ER1
    • ActiveMQ
    • None
    • Hide

      This is not 100% reproducer, it seems that chance to hit the issue is rather low:

      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout master
      
      groovy -DEAP_VERSION=7.1.0.ER1 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailoverClientAckQueueShutDown -DfailIfNoTests=false -Deap=7x  | tee log
      
      Show
      This is not 100% reproducer , it seems that chance to hit the issue is rather low: git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout master groovy -DEAP_VERSION=7.1.0.ER1 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailoverClientAckQueueShutDown -DfailIfNoTests= false -Deap=7x | tee log

    Description

      Customer impact: JMS consumer can hang after failover to backup server.

      Test scenario:

      • Start 2 EAP 7.1 servers in colocated topology with replicated journal
      • Start sending messages to server1 to queue testQueue
      • Start consumer consuming messages from queue testQueue from server1
      • Cleanly shutdown server 1
      • Colocated backup on server 2 activates and clients failover
      • Stop producer and wait for consumer to receive all mesasges fromt testQueue
        Pass criteria: Number of send and received messages is the same

      Actual result: Sometimes after shutdown of server 1, backup on server 2 logs NPE:

      11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1
      , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3)
      11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1
      , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3)
      11:54:34,744 ERROR [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord:java.
      lang.NullPointerException: java.lang.NullPointerException
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl$3.run(JournalImpl.java:859) [artemis-journal-1.5.5.002-redhat-1.jar:1.5.5.002-redhat-1]
              at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.5.002-redhat-1.jar:1.5.5.002-redhat-1]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
      

      It happens during activation of backup on server 2. Producer continues sending messages but consumer hangs in consumer.receive(timeout) method:

      Stack trace of thread: Thread[Thread-842,5,main]
      ---java.lang.Object.wait(Native Method)
      ---org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:261)
      ---org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:388)
      ---org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:211)
      ---org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:132)
      ---org.jboss.qa.hornetq.apps.clients.Receiver11.receiveMessage(Receiver11.java:140)
      ---org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:81)
      

      NPE is thrown from JournalImpl.appendDeleteRecord:859:

      // record==null here could only mean there is a compactor
      // computing the delete should be done after compacting is done
       if (record == null) {
                        compactor.addCommandDelete(id, usedFile); <- NPE is thrown here
      

      There is comment that record by be null only if compactor is in progress but by checking trace logs, compactor was not in progress in the moment when NPE was thrown and is not responsible for NPE.

      The problem is that the given record was deleted 2 times in quick succession and 2nd call failed as record was null. There are following traces:

      11:54:34,741 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) scheduling appendDeleteRecord::id=1
      11:54:34,741 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) scheduling appendDeleteRecord::id=1
      ...
      11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1
      , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3)
      11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1
      

      I could not find why one record was scheduled twice for deletion. I should also point here that i'm not sure if this NPE is responsible for hanging consumer.

      Attachments

        Activity

          People

            rhn-cservice-bbaranow Bartosz Baranowski
            mnovak1@redhat.com Miroslav Novak
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: