-
Bug
-
Resolution: Cannot Reproduce
-
Blocker
-
13.0.0.Beta1
In messaging HA scenarios we see that tests sometimes fail because of exception [1]. The exception is thrown from ServerLocatorImpl::createSessionFactory which waits until initial broadcast from cluster is received, but it is not received in 10 seconds timeout. In following scenario the issue causes split brain - both live and backup are active at the same time.
Scenario
- There are two Wildfly/Artemis servers configured as Live-Backup pair
- Live server is killed
- Backup server becomes new Live server
- Live server is restarted and failback is performed
Expectation: Failback performs successfully. Original Live becomes Live again and original Backup becomes Backup again.
Reality: After the Live is restarted, it does not detect that there is already another active Live with the same nodeId and it activates (becomes Live) so both Live and Backup are active at the same time.
Technical details
I found out that original Live server does not detect that its Backup is active, because SharedNothingLiveActivation::isNodeIdUsed returns false. I added more logs to this method and found out that the false is returned from line [2] after that the exception [1] is thrown.
Investigation report
I tried to downgrade Artemis to 1.5.5.jbossorg-010 version but I hit the same issue. So the issue is not caused by recent Artemis upgrade.
It is regression against EAP 7.1.0.GA but it is not regression against Wildfly 12.
We see the issue only with JGroups. I tried to run the test with Netty discovery and it passed 50 times in row.
[1]
ActiveMQConnectionTimedOutException[errorType=CONNECTION_TIMEDOUT message=AMQ119012: Timed out waiting to receive initial broadcast from cluster] at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.createSessionFactory(ServerLocatorImpl.java:749) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connect(ServerLocatorImpl.java:627) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.connectNoWarnings(ServerLocatorImpl.java:633) [artemis-core-client-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.isNodeIdUsed(SharedNothingLiveActivation.java:280) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation.run(SharedNothingLiveActivation.java:90) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.internalStart(ActiveMQServerImpl.java:539) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.start(ActiveMQServerImpl.java:485) [artemis-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.apache.activemq.artemis.jms.server.impl.JMSServerManagerImpl.start(JMSServerManagerImpl.java:413) [artemis-jms-server-1.5.5.jbossorg-012.jar:1.5.5.jbossorg-SNAPSHOT] at org.wildfly.extension.messaging.activemq.jms.JMSService.doStart(JMSService.java:205) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT] at org.wildfly.extension.messaging.activemq.jms.JMSService.access$000(JMSService.java:64) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT] at org.wildfly.extension.messaging.activemq.jms.JMSService$1.run(JMSService.java:99) [wildfly-messaging-activemq-13.0.0.Beta2-SNAPSHOT.jar:13.0.0.Beta2-SNAPSHOT] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [rt.jar:1.8.0_171] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_171] at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) [jboss-threads-2.3.2.Final.jar:2.3.2.Final] at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985) [jboss-threads-2.3.2.Final.jar:2.3.2.Final] at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487) [jboss-threads-2.3.2.Final.jar:2.3.2.Final] at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1378) [jboss-threads-2.3.2.Final.jar:2.3.2.Final] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_171] at org.jboss.threads.JBossThread.run(JBossThread.java:485) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]