Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-7810

Artemis hangs during failback in remote JCA scenario

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 11.0.0.Alpha1
    • None
    • JMS
    • None
    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      git checkout 4e300f3ffa4a632506b4afb452eaf0405e0a64f5
      groovy -DEAP_VERSION=7.1.0.DR9 PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      export JOURNAL_DIRECTORY_A=$WORKSPACE/journal-A
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=DedicatedFailoverTestCaseWithMdb#testKillWithFailback -DfailIfNoTests=false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1.0.DR9 | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ git checkout 4e300f3ffa4a632506b4afb452eaf0405e0a64f5 groovy -DEAP_VERSION=7.1.0.DR9 PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap export JOURNAL_DIRECTORY_A=$WORKSPACE/journal-A cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=DedicatedFailoverTestCaseWithMdb#testKillWithFailback -DfailIfNoTests= false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1.0.DR9 | tee log

      Remote JCA scenario:

      • There are 3 nodes
      • Node 1 and node 2 are Live-Backup pair (replicated HA)
      • Node 3 has MDB which remotely connects to node 1 and is able to do failover on node 2
      • During the test, node 1 is killed and started again

      Problem occurs when node 1 is started again. Servers are configured to do failback. When node 1 wants to become live again, something goes wrong with connection between node 1 and node 2. On node 1 I can see repeated WARN message [1]. Node 2 prints repeatedly WARN message [2].

      I can see the same issue also with 7.0.x. We haven't notice this error because the test didn't check state of servers after the failback.

      When I modify the test to not deploy MDB on node 3, the test passes without any unusual error. It seems the issue is related to this scenario.

      [1]

      09:59:09,197 WARN  [org.apache.activemq.artemis.core.server] (Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$2@26357508-1826618556)) AMQ222137: Unable to announce backup, retrying: ActiveMQConnec
      tionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119012: Timed out waiting to receive initial broadcast from cluster]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:747) [artemis-core-client-1.5.0.redhat-1.jar:1.5.0.redhat-1]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:625) [artemis-core-client-1.5.0.redhat-1.jar:1.5.0.redhat-1]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:607) [artemis-core-client-1.5.0.redhat-1.jar:1.5.0.redhat-1]
              at org.apache.activemq.artemis.core.server.cluster.BackupManager$BackupConnector$1.run(BackupManager.java:246) [artemis-server-1.5.0.redhat-1.jar:1.5.0.redhat-1]
              at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.0.redhat-1.jar:1.5.0.redhat-1]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
      

      [2]

      10:00:19,245 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:00:29,245 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:00:39,245 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:00:49,246 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:00:59,247 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:09,247 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:19,248 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:29,248 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:39,249 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:49,249 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:01:59,250 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      10:02:09,250 WARN  [org.apache.activemq.artemis.core.client] (Thread-135) AMQ212042: Timed out waiting for packet to be flushed
      

              jmesnil1@redhat.com Jeff Mesnil
              jmesnil1@redhat.com Jeff Mesnil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: