There is OutOfMemoryError if large journal is replicated from live to backup. EAP 7.0.0.ER4 servers are configured in dedicated HA topology with replicated journal.
OOME is thrown on live server:
08:59:03,687 ERROR [stderr] (Thread-116) Exception in thread "Thread-116" java.lang.OutOfMemoryError: Direct buffer memory 08:59:03,696 ERROR [stderr] (Thread-116) at java.nio.Bits.reserveMemory(Bits.java:658) 08:59:03,696 ERROR [stderr] (Thread-116) at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123) 08:59:03,696 ERROR [stderr] (Thread-116) at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311) 08:59:03,696 ERROR [stderr] (Thread-116) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108) 08:59:03,696 ERROR [stderr] (Thread-116) at io.netty.buffer.UnpooledUnsafeDirectByteBuf.capacity(UnpooledUnsafeDirectByteBuf.java:157) 08:59:03,696 ERROR [stderr] (Thread-116) at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:250) 08:59:03,696 ERROR [stderr] (Thread-116) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:858) 08:59:03,696 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.writeBytes(ChannelBufferWrapper.java:473) 08:59:03,696 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.ReplicationSyncFileMessage.encodeRest(ReplicationSyncFileMessage.java:129) 08:59:03,696 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.encode(PacketImpl.java:277) 08:59:03,696 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:225) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:201) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:334) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:318) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.replication.ReplicationManager.sendLargeFile(ReplicationManager.java:506) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.replication.ReplicationManager.syncPages(ReplicationManager.java:457) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.sendPages(PagingStoreImpl.java:1052) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.sendPagesToBackup(JournalStorageManager.java:482) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:376) 08:59:03,697 ERROR [stderr] (Thread-116) at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:160) 08:59:03,697 ERROR [stderr] (Thread-116) at java.lang.Thread.run(Thread.java:745)
To reproduce this issue download zip with prepared EAP 7.0.0.ER4 servers:
scp jbossqa@10.40.4.81:/home/jbossqa/tmp/oome-servers.zip . # check password in private comment
unzip oome-servers.zip
cd server1/jboss-eap/bin
sh standalone.sh -c standalone-full-ha.xml
# in another console go to server2 and start it
cd server2/jboss-eap/bin
sh standalone.sh -c standalone-full-ha.xml -Djboss.socket.binding.port-offset=1000
If OOME does not appear then cause failover and then failback multiple times by shutting down server1 and starting it again.
Customer impact:
During synchronization live with backup, OOME exception occurs. It causes that synchronization fails and later failover/failback will not happen. It not only breaks HA but also crashes former live server which can no longer serve JMS clients. This will lead to unavailability of service in production.
- is blocked by
-
JBEAP-2499 Upgrade Artemis to 1.1.0.wildfly-011
- Closed
- relates to
-
ARTEMIS-350 Loading...