-
Bug
-
Resolution: Done
-
Major
-
7.0.0.ER6, 7.1.0.DR12, 7.1.0.CR2
-
None
java.lang.IllegalStateException: Didn't get the expected number of bindings, look at the logging for more information
at org.apache.activemq.artemis.tests.integration.cluster.distribution.ClusterTestBase.waitForBindings(ClusterTestBase.java:460)
at org.apache.activemq.artemis.tests.integration.cluster.distribution.SymmetricClusterWithBackupTest.testMixtureLoadBalancedAndNonLoadBalancedQueuesAddQueuesAndConsumersBeforeAllServersAreStarted(SymmetricClusterWithBackupTest.java:223)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:95)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:507)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:283)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:173)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:128)
at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:203)
at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:155)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
Test didn't get expected number of bindings because bridges weren't created between all nodes. In the test we have 5 live and 5 backup servers. All servers are started in sequence: Backup-0, Live-0, Backup-1, Live-1,... Live-4 creates bridges to other servers but no server creates bridge to Live-4. This situation happens when following race condition is met:
- Each server sends NODE_ANNOUNCE packet when it successfully starts
- The NODE_ANNOUNCE packet contains a nodeID and an eventID which is actually timestamp
- When some node receives NODE_ANNOUNCE packet, it determines whether it knows this node identified by nodeID or not. If it does, it updates the record about this node only if received packet has newer eventID (timestamp).
- Both Live and Backup creates record with eventID in method Topology.updateAsLive
- Backup updates this record in method Topology.updateBackup
- It can happen that aforementioned methods are executed in following sequence
- Backup.updateAsLive - it creates the record with eventID 1
- Live.updateAsLive - it creates the record with eventID 2
- Backup.updateBackup - it updates the eventID of record to 3
- After this sequence of methods, the Live has eventID 2, which is lower than eventID of Backup
- All nodes in cluster receives at first NODE_ANNONUCE packet from Backup, because it is started earlier than Live
- When Live sends NODE_ANNOUNCE packet, it is ignored by all nodes because the eventID of Live is less than eventID of Backup (Note: both Live and Backup has the same nodeID)
Solution In the method ClusterConnectionImpl.onConnection we should update eventID of localMember before we send NODE_ANNOUNCE packet.
- blocks
-
JBEAP-2574 Stabilization of ActiveMQ Artemis upstream test suite
- Closed
- is related to
-
JBEAP-13637 Live's topology update may be ignored
- Closed
-
JBEAP-16026 [QA](7.1.z) Live's topology update may be ignored
- Closed
- is caused by
-
ARTEMIS-1484 Loading...