Regression against EAP 7.0.0.ER6 was hit in scenario where backup was synchronizing with live server. 2 EAP 7 servers are configured in dedicated topology with replicated journal. During synchronization deadlock occurs on live server. Attaching full thread dump from live server.
Deadlock was detected by "jstack" and points to following threads:
Java stack information for the threads listed above: =================================================== "Thread-101": at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.isWritable(NettyConnection.java:106) - waiting to lock <0x00000000fee15ca8> (a java.util.concurrent.LinkedBlockingDeque) at org.apache.activemq.artemis.spi.core.protocol.AbstractRemotingConnection.isWritable(AbstractRemotingConnection.java:55) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:345) - locked <0x00000000fee16168> (a java.lang.Object) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:329) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendLargeFile(ReplicationManager.java:540) at org.apache.activemq.artemis.core.replication.ReplicationManager.syncLargeMessageFile(ReplicationManager.java:485) at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.sendLargeMessageFiles(JournalStorageManager.java:521) at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:384) at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:160) at java.lang.Thread.run(Thread.java:745) "default I/O-15": at org.apache.activemq.artemis.core.replication.ReplicationManager.readyForWriting(ReplicationManager.java:380) - waiting to lock <0x00000000fee16168> (a java.lang.Object) at org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.fireReady(NettyConnection.java:126) - locked <0x00000000fee15ca8> (a java.util.concurrent.LinkedBlockingDeque) at org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptor$Listener.connectionReadyForWrites(NettyAcceptor.java:676) at org.apache.activemq.artemis.core.remoting.impl.netty.ActiveMQChannelHandler.channelWritabilityChanged(ActiveMQChannelHandler.java:61) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:366) at io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:348) at io.netty.channel.ChannelInboundHandlerAdapter.channelWritabilityChanged(ChannelInboundHandlerAdapter.java:119) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelWritabilityChanged(AbstractChannelHandlerContext.java:366) at io.netty.channel.AbstractChannelHandlerContext.fireChannelWritabilityChanged(AbstractChannelHandlerContext.java:348) at io.netty.channel.DefaultChannelPipeline.fireChannelWritabilityChanged(DefaultChannelPipeline.java:861) at io.netty.channel.ChannelOutboundBuffer.fireChannelWritabilityChanged(ChannelOutboundBuffer.java:589) at io.netty.channel.ChannelOutboundBuffer.setWritable(ChannelOutboundBuffer.java:555) at io.netty.channel.ChannelOutboundBuffer.decrementPendingOutboundBytes(ChannelOutboundBuffer.java:198) at io.netty.channel.ChannelOutboundBuffer.remove(ChannelOutboundBuffer.java:263) at org.xnio.netty.transport.AbstractXnioSocketChannel.doWrite(AbstractXnioSocketChannel.java:174) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush0(AbstractChannel.java:765) at org.xnio.netty.transport.AbstractXnioSocketChannel$AbstractXnioUnsafe.flush0(AbstractXnioSocketChannel.java:363) at io.netty.channel.AbstractChannel$AbstractUnsafe.flush(AbstractChannel.java:733) at io.netty.channel.DefaultChannelPipeline$HeadContext.flush(DefaultChannelPipeline.java:1127) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663) at io.netty.channel.AbstractChannelHandlerContext.flush(AbstractChannelHandlerContext.java:644) at io.netty.channel.ChannelDuplexHandler.flush(ChannelDuplexHandler.java:117) at io.netty.channel.AbstractChannelHandlerContext.invokeFlush(AbstractChannelHandlerContext.java:663) at io.netty.channel.AbstractChannelHandlerContext.access$1500(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:961) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at org.xnio.nio.WorkerThread.safeRun(WorkerThread.java:580) at org.xnio.nio.WorkerThread.run(WorkerThread.java:464) Found 1 deadlock.
Customer impact:
Deadlock causes that all JMS clients cannot send/receive messages. EAP 7 server configured as live cannot be cleanly shutdown and must be killed.
Note: There is another issue with replicated journal - JBEAP-3900 - "Split Brain issue with Replication" but this doesn not appear to be caused by it as there are NO log messages like:
AMQ212034: There are more than one servers on the network broadcasting the same node id. You will see this message exactly once (per node) if a node is restarted, in which case it can be safely ignored. But if it is logged continuously it means you really do have more than one node on the same network active concurrently with the same node id. This could occur if you have a backup node active at the same time as its live node. nodeID=49ed198c-eb59-11e5-86fb-d3a98519ea5e
- duplicates
-
JBEAP-3900 Split Brain issue with Replication
- Closed