Details
-
Bug
-
Resolution: Obsolete
-
Critical
-
7.1.0.ER1
-
None
Description
Customer impact: JMS consumer can hang after failover to backup server.
Test scenario:
- Start 2 EAP 7.1 servers in colocated topology with replicated journal
- Start sending messages to server1 to queue testQueue
- Start consumer consuming messages from queue testQueue from server1
- Cleanly shutdown server 1
- Colocated backup on server 2 activates and clients failover
- Stop producer and wait for consumer to receive all mesasges fromt testQueue
Pass criteria: Number of send and received messages is the same
Actual result: Sometimes after shutdown of server 1, backup on server 2 logs NPE:
11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1 , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3) 11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1 , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3) 11:54:34,744 ERROR [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord:java. lang.NullPointerException: java.lang.NullPointerException at org.apache.activemq.artemis.core.journal.impl.JournalImpl$3.run(JournalImpl.java:859) [artemis-journal-1.5.5.002-redhat-1.jar:1.5.5.002-redhat-1] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.5.002-redhat-1.jar:1.5.5.002-redhat-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_121]
It happens during activation of backup on server 2. Producer continues sending messages but consumer hangs in consumer.receive(timeout) method:
Stack trace of thread: Thread[Thread-842,5,main] ---java.lang.Object.wait(Native Method) ---org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:261) ---org.apache.activemq.artemis.core.client.impl.ClientConsumerImpl.receive(ClientConsumerImpl.java:388) ---org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.getMessage(ActiveMQMessageConsumer.java:211) ---org.apache.activemq.artemis.jms.client.ActiveMQMessageConsumer.receive(ActiveMQMessageConsumer.java:132) ---org.jboss.qa.hornetq.apps.clients.Receiver11.receiveMessage(Receiver11.java:140) ---org.jboss.qa.hornetq.apps.clients.ReceiverClientAck.run(ReceiverClientAck.java:81)
NPE is thrown from JournalImpl.appendDeleteRecord:859:
// record==null here could only mean there is a compactor // computing the delete should be done after compacting is done if (record == null) { compactor.addCommandDelete(id, usedFile); <- NPE is thrown here
There is comment that record by be null only if compactor is in progress but by checking trace logs, compactor was not in progress in the moment when NPE was thrown and is not responsible for NPE.
The problem is that the given record was deleted 2 times in quick succession and 2nd call failed as record was null. There are following traces:
11:54:34,741 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) scheduling appendDeleteRecord::id=1 11:54:34,741 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) scheduling appendDeleteRecord::id=1 ... 11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1 , usedFile = JournalFileImpl: (activemq-bindings-3.bindings id = 3, recordID = 3) 11:54:34,743 TRACE [org.apache.activemq.artemis.core.journal.impl.JournalImpl] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@64266722)) appendDeleteRecord::id=1
I could not find why one record was scheduled twice for deletion. I should also point here that i'm not sure if this NPE is responsible for hanging consumer.