Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-3581

[Artemis Testsuite] NettySymmetricClusterWithBackupTest#testMixtureLoadBalancedAndNonLoadBalancedQueuesAddQueuesAndConsumersBeforeAllServersAreStarted

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 7.2.0.GA.CR1
    • 7.0.0.ER6, 7.1.0.DR12, 7.1.0.CR2
    • ActiveMQ
    • None

    Description

      java.lang.IllegalStateException: Didn't get the expected number of bindings, look at the logging for more information
      	at org.apache.activemq.artemis.tests.integration.cluster.distribution.ClusterTestBase.waitForBindings(ClusterTestBase.java:460)
      	at org.apache.activemq.artemis.tests.integration.cluster.distribution.SymmetricClusterWithBackupTest.testMixtureLoadBalancedAndNonLoadBalancedQueuesAddQueuesAndConsumersBeforeAllServersAreStarted(SymmetricClusterWithBackupTest.java:223)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
      	at java.lang.reflect.Method.invoke(Method.java:507)
      	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
      	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
      	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
      	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
      	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
      	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
      	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
      	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
      	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
      	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
      	at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
      	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
      	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
      	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
      	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
      	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
      	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
      	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
      	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
      	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
      	at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
      	at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
      	at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
      	at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
      

      Test didn't get expected number of bindings because bridges weren't created between all nodes. In the test we have 5 live and 5 backup servers. All servers are started in sequence: Backup-0, Live-0, Backup-1, Live-1,... Live-4 creates bridges to other servers but no server creates bridge to Live-4. This situation happens when following race condition is met:

      • Each server sends NODE_ANNOUNCE packet when it successfully starts
      • The NODE_ANNOUNCE packet contains a nodeID and an eventID which is actually timestamp
      • When some node receives NODE_ANNOUNCE packet, it determines whether it knows this node identified by nodeID or not. If it does, it updates the record about this node only if received packet has newer eventID (timestamp).
      • Both Live and Backup creates record with eventID in method Topology.updateAsLive
      • Backup updates this record in method Topology.updateBackup
      • It can happen that aforementioned methods are executed in following sequence
      1. Backup.updateAsLive - it creates the record with eventID 1
      2. Live.updateAsLive - it creates the record with eventID 2
      3. Backup.updateBackup - it updates the eventID of record to 3
      • After this sequence of methods, the Live has eventID 2, which is lower than eventID of Backup
      • All nodes in cluster receives at first NODE_ANNONUCE packet from Backup, because it is started earlier than Live
      • When Live sends NODE_ANNOUNCE packet, it is ignored by all nodes because the eventID of Live is less than eventID of Backup (Note: both Live and Backup has the same nodeID)

      Solution In the method ClusterConnectionImpl.onConnection we should update eventID of localMember before we send NODE_ANNOUNCE packet.

      Attachments

        Issue Links

          Activity

            People

              jondruse@redhat.com Jiri Ondrusek
              eduda_jira Erich Duda (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: