Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-10474

AMQ119012: Timed out waiting to receive initial broadcast from cluster

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Blocker
    • 14.0.0.Beta2
    • 13.0.0.Beta1
    • JMS
    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/447//artifact/jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackTransAckQueueTcpStack -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1527157321-wildfly-master-SNAPSHOT -DfailIfNoTests=false -Deap=7x | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/early-testing-messaging-prepare/447//artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedColocatedClusterFailoverTestCase#testFailbackTransAckQueueTcpStack -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1527157321-wildfly-master-SNAPSHOT -DfailIfNoTests= false -Deap=7x | tee log

    Description

      In messaging HA scenarios we see that tests sometimes fail because of exception [1]. The exception is thrown from ServerLocatorImpl::createSessionFactory which waits until initial broadcast from cluster is received, but it is not received in 10 seconds timeout. In following scenario the issue causes split brain - both live and backup are active at the same time.

      Scenario

      • There are two Wildfly/Artemis servers configured as Live-Backup pair
      • Live server is killed
      • Backup server becomes new Live server
      • Live server is restarted and failback is performed

      Expectation: Failback performs successfully. Original Live becomes Live again and original Backup becomes Backup again.

      Reality: After the Live is restarted, it does not detect that there is already another active Live with the same nodeId and it activates (becomes Live) so both Live and Backup are active at the same time.

      Technical details

      I found out that original Live server does not detect that its Backup is active, because SharedNothingLiveActivation::isNodeIdUsed returns false. I added more logs to this method and found out that the false is returned from line [2] after that the exception [1] is thrown.

      Investigation report

      I tried to downgrade Artemis to 1.5.5.jbossorg-010 version but I hit the same issue. So the issue is not caused by recent Artemis upgrade.

      It is regression against EAP 7.1.0.GA but it is not regression against Wildfly 12.

      We see the issue only with JGroups. I tried to run the test with Netty discovery and it passed 50 times in row.

      [1]

      ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119012: Timed out waiting to receive initial broadcast from cluster]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:749) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:627) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connectNoWarnings(ServerLocatorImpl.java:633) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.isNodeIdUsed(SharedNothingLiveActivation.java:280) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.run(SharedNothingLiveActivation.java:90) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:539) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:485) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.start(JMSServerManagerImpl.java:413) [artemis-jms-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT]
              at org.wildfly.extension.messaging.activemq.jms.JMSService.doStart(JMSService.java:205) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT]
              at org.wildfly.extension.messaging.activemq.jms.JMSService.access$000(JMSService.java:64) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT]
              at org.wildfly.extension.messaging.activemq.jms.JMSService$1.run(JMSService.java:99) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_171]
              at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_171]
              at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
              at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
              at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
              at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1378) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
              at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171]
              at org.jboss.threads.JBossThread.run(JBossThread.java:485) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      

      [2] https://github.com/rh-messaging/jboss-activemq-artemis/blob/465eb8733e465a949482cf618ad475b0a9e8afcd/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/impl/SharedNothingLiveActivation.java#L281

      Attachments

        Activity

          People

            jmesnil1@redhat.com Jeff Mesnil
            eduda_jira Erich Duda (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: