Loading...

Type: Bug
Resolution: Duplicate
Priority: Blocker
Fix Version/s: None
Affects Version/s: 7.1.0.DR15
Component/s: ActiveMQ
Labels:
None
Environment:

NFS

Affects Testing:

Regression
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.1.0.GA
Steps to Reproduce:
Hide

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout b17d5f547b72b0370248f9f13f87147db1ed9d87 groovy -DEAP_VERSION=7.1.0.DR15 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=DedicatedFailoverTestCase#testMultipleFailoverReceiver -DfailIfNoTests=false -Deap=7x | tee log
Show
git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout b17d5f547b72b0370248f9f13f87147db1ed9d87 groovy -DEAP_VERSION=7.1.0.DR15 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=DedicatedFailoverTestCase#testMultipleFailoverReceiver -DfailIfNoTests= false -Deap=7x | tee log

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

We have test which does 10 times failover/failback in shared store HA scenario to be 100% sure that HA is robust. The test at first send 50 000 messages. During sending there is no crash of server. After that there is sequence of 10 kills and restarts of server 1 which is doing during receiving of messages.

When journal is located on NFS I see a lot of following warnings during initial sending of messages. During sending the test doesn't perform any crash or restart of EAP. Producer is not aware of any error. All operations performed by producer are successful.

05:07:13,527 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,454 during compact replay
05:07:13,530 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,457 during compact replay
05:07:13,530 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,460 during compact replay
05:07:13,530 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,463 during compact replay
05:07:13,530 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,466 during compact replay
05:07:13,530 WARN  [org.apache.activemq.artemis.journal] (Thread-13 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@24331ae0-1686156632)) AMQ142007: Can not find record 61,469 during compact replay

05:08:06,221 WARN  [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@7a9014c8-370821279)) AMQ222055: Error on deleting duplic
ate cache: java.lang.IllegalStateException: Cannot find add info 45135 on compactor or current records
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:937) [artemis-journal-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:822) [artemis-journal-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.deleteDuplicateID(AbstractJournalStorageManager.java:451) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redha
t-1]
        at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.addToCacheInMemory(DuplicateIDCacheImpl.java:263) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl.access$000(DuplicateIDCacheImpl.java:41) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.process(DuplicateIDCacheImpl.java:340) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.postoffice.impl.DuplicateIDCacheImpl$AddDuplicateIDOperation.afterCommit(DuplicateIDCacheImpl.java:348) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.afterCommit(TransactionImpl.java:546) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl.access$100(TransactionImpl.java:38) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.transaction.impl.TransactionImpl$3.done(TransactionImpl.java:321) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.core.persistence.impl.journal.OperationContextImpl$1.run(OperationContextImpl.java:236) [artemis-server-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.003-redhat-1.jar:1.5.3.003-redhat-1]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
        at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]

After producer finishes its job, receiver starts to receive messages. During receiving of messages server 1 is 10 times killed and restarted. Server 1 has shared store backup. When Artemis loads journal at server's start I see a lot of warnings listed below. I assume these warnings can be shown only if Artemis is killed during sending of messages.

At the end the test detects lost messages. I see that entire bulks of messages are lost. Producer commits each 100th message. When I tracked lost messages in server logs, I saw that they were successfully received by Artemis, paged and the transaction was committed.

05:12:06,359 WARN  [org.apache.activemq.artemis.core.server] (Thread-17 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@504f2fba-1586027459)) AMQ222029: Could not locate page transaction 45,131, ignoring message on position PagePositionImpl [pageNr=2, messageNr=3119, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
05:12:06,361 WARN  [org.apache.activemq.artemis.core.server] (Thread-17 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@504f2fba-1586027459)) AMQ222029: Could not locate page transaction 45,131, ignoring message on position PagePositionImpl [pageNr=2, messageNr=3120, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue
05:12:06,361 WARN  [org.apache.activemq.artemis.core.server] (Thread-17 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@504f2fba-1586027459)) AMQ222029: Could not locate page transaction 45,131, ignoring message on position PagePositionImpl [pageNr=2, messageNr=3121, recordID=-1] on address=jms.queue.testQueue queue=jms.queue.testQueue

This is regression against 7.0.z. The same test passes with 7.0.5. With EAP 7.0.5 I don't see neither mentioned warnings nor lost messages.

Customer impact: Paged messages can be lost if the journal is located on NFS disc. Since NFS disc is used with shared store HA, this blocks using of shared store HA on NFS.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

Hide
test.log.zip
2017/03/31 10:30 AM
2.16 MB
Erich Duda
Extracting archive...
Show
test.log.zip
2017/03/31 10:30 AM
2.16 MB
Erich Duda

duplicates

JBEAP-9937 Lost messages in cluster with server crashes

Closed

is incorporated by

JBEAP-10454 Upgrade Artemis 1.5.4.jbossorg-003

Closed

relates to

ENTMQBR-585 AMQ7 replication failed

Closed

JBEAP-9937 Lost messages in cluster with server crashes

Closed

JBEAP-10475 IndexOutOfBoundsException during reading message if compacting is enabled

Closed

ARTEMIS-1114 Loading...

(1 relates to)

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates