Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Critical
Fix Version/s: None
Affects Version/s: 7.0.3.GA, 7.0.5.CR1
Component/s: ActiveMQ
Labels:
None

CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Target Release:

7.0.z.GA
Steps to Reproduce:
Hide

git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout 795237ecf5919a611f904920b33d84e574587975 groovy -DEAP_VERSION=7.1.0.DR11 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackClientAckTopic -DfailIfNoTests=false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1.0.DR11 | tee log
Show
git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout 795237ecf5919a611f904920b33d84e574587975 groovy -DEAP_VERSION=7.1.0.DR11 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testFailbackClientAckTopic -DfailIfNoTests= false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1.0.DR11 | tee log

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In replicated HA scenarios I can see the replication is broken because of [1].

This issue was already discussed in ~~JBEAP-4742~~, see comments. As a solution the timeout was made configurable. You can configure it using call-timeout in cluster-connection.

I have seen this issue in our CI but I have suspected it is an environment issue caused by slow NFS. However I dug into this a bit more. Here are my findings.

It seems that something hangs the synchronization process because increasing of call-timeout doesn't help.

I have tracked sending and receiving of synchronization packets in trace logs. There is 60s window in which no packet is handled or sent. Hanging packets are received after the [1] is printed to log and replication is canceled.

When I set call-timeout to 2 minutes, replication fails because of connection timeout error.

I can easily reproduce the issue in our CI, but I can't reproduce it locally on my laptop. Maybe there is some race condition which reveals only in slower environment.

I can see the same issue with 7.0.x.

Tip for debug: On both servers there is one thread which takes care about sending/handling replication packets. You can track these threads in trace logs, see attachment.

[1]

10:43:58,180 WARN  [org.apache.activemq.artemis.core.server] (Thread-131) AMQ222207: The backup server is not responding promptly introducing latency beyond the limit. Replication server being disconnected now.

Customer impact: Replication between Live and Backup may fail and the process is not restored automatically. Admin has to identify such situation and restart server which acts as Backup.

clones

JBEAP-7968 (7.1.0) The backup server is not responding promptly introducing latency beyond the limit.

Closed

incorporates

ENTMQBR-556 Sync won't catch up on replication under load

Closed

Assignee:: Bartosz Baranowski

Reporter:: Erich Duda (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2017/02/02 8:33 AM

Updated:: 2021/10/24 5:42 AM

Resolved:: 2019/09/25 3:26 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates