Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-1078

Regression in cluster tests with network failures

XMLWordPrintable

    • Hide
      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git
      cd eap-tests-hornetq/scripts/
      groovy -DEAP_ZIP_URL=https://eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/view/EAP7-JMS/view/early-testing/view/tooling/job/early-testing-messaging-prepare/206/artifact/jboss-eap.zip PrepareServers7.groovy
      export WORKSPACE=$PWD
      export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap
      export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap
      export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap
      export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap
      
      cd ../jboss-hornetq-testsuite/
      
      mvn clean test -Dtest=NetworkFailuresHornetQCoreBridges#testNetworkFailureSmallMessages -DfailIfNoTests=false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1517218654-SNAPSHOT | tee log
      
      or 
      
      mvn clean test -Dtest=Lodh4TestCase#testFailOfOneServer -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1517906210-SNAPSHOT -DfailIfNoTests=false -Deap=7x | tee log
      
      Show
      git clone git: //git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ groovy -DEAP_ZIP_URL=https: //eap-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/EAP7/view/EAP7-JMS/view/early-testing/view/tooling/job/early-testing-messaging-prepare/206/artifact/jboss-eap.zip PrepareServers7.groovy export WORKSPACE=$PWD export JBOSS_HOME_1=$WORKSPACE/server1/jboss-eap export JBOSS_HOME_2=$WORKSPACE/server2/jboss-eap export JBOSS_HOME_3=$WORKSPACE/server3/jboss-eap export JBOSS_HOME_4=$WORKSPACE/server4/jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=NetworkFailuresHornetQCoreBridges#testNetworkFailureSmallMessages -DfailIfNoTests= false -Deap=7x -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1517218654-SNAPSHOT | tee log or mvn clean test -Dtest=Lodh4TestCase#testFailOfOneServer -Deap7.org.jboss.qa.hornetq.apps.clients.version=7.1517906210-SNAPSHOT -DfailIfNoTests= false -Deap=7x | tee log

      Scenario

      • There are two Artemis brokers configured to form cluster
      • There is a producer sending messages to broker 1 and receiver receiving messages from broker 2
      • Between the brokers there is a proxy which simulates network failure
      • The proxy is several times stopped and restarted to simulate the network failure
      • The test expects that all messages sent to broker 1 will be received by receiver from broker 2 (despite the network failures)

      Reality: After the proxy is stopped and restarted, the cluster is not able to form again. Both brokers try to reconnect to their opposites but with no luck.

      Customer scenario: Messaging cluster is not able to recover after network failures.

      Investigation of issue

      I investigated why brokers are not able to reconnect and I found out that always when they try to reconnect, they give it up because there is no topology record for nodeId where they try to connect. So the re-connection attempt ends here [1].

      I compared the behavior with Artemis 1.x and I found out that Artemis 2.x removes the topology member when connection failure is detected, but Artemis 1.x doesn't. When I commented the line [2] it fixed the issue. This line is not present in 1.x.

      [1] https://github.com/apache/activemq-artemis/blob/b66d0f7ac40001cce14ca7146e74720504ff9eb1/artemis-core-client/src/main/java/org/apache/activemq/artemis/core/client/impl/ServerLocatorImpl.java#L659
      [2] https://github.com/apache/activemq-artemis/blob/b66d0f7ac40001cce14ca7146e74720504ff9eb1/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/cluster/impl/BridgeImpl.java#L782

              mtaylor1@redhat.com Martyn Taylor (Inactive)
              mtaylor1@redhat.com Martyn Taylor (Inactive)
              Roman Vais Roman Vais (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: