Uploaded image for project: 'JBoss Enterprise Application Platform'
  1. JBoss Enterprise Application Platform
  2. JBEAP-2829

Replication of large journal leads to: "java.lang.OutOfMemoryError: Direct buffer memory"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Blocker
    • 7.0.0.ER5
    • 7.0.0.ER4
    • ActiveMQ
    • None
    • Regression

    Description

      There is OutOfMemoryError if large journal is replicated from live to backup. EAP 7.0.0.ER4 servers are configured in dedicated HA topology with replicated journal.

      OOME is thrown on live server:

      08:59:03,687 ERROR [stderr] (Thread-116) Exception in thread "Thread-116" java.lang.OutOfMemoryError: Direct buffer memory
      08:59:03,696 ERROR [stderr] (Thread-116) 	at java.nio.Bits.reserveMemory(Bits.java:658)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.allocateDirect(UnpooledUnsafeDirectByteBuf.java:108)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at io.netty.buffer.UnpooledUnsafeDirectByteBuf.capacity(UnpooledUnsafeDirectByteBuf.java:157)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:250)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:858)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.buffers.impl.ChannelBufferWrapper.writeBytes(ChannelBufferWrapper.java:473)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.protocol.core.impl.wireformat.ReplicationSyncFileMessage.encodeRest(ReplicationSyncFileMessage.java:129)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.protocol.core.impl.PacketImpl.encode(PacketImpl.java:277)
      08:59:03,696 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:225)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.protocol.core.impl.ChannelImpl.send(ChannelImpl.java:201)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:334)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.replication.ReplicationManager.sendReplicatePacket(ReplicationManager.java:318)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.replication.ReplicationManager.sendLargeFile(ReplicationManager.java:506)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.replication.ReplicationManager.syncPages(ReplicationManager.java:457)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.sendPages(PagingStoreImpl.java:1052)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.sendPagesToBackup(JournalStorageManager.java:482)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.startReplication(JournalStorageManager.java:376)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at org.apache.activemq.artemis.core.server.impl.SharedNothingLiveActivation$2.run(SharedNothingLiveActivation.java:160)
      08:59:03,697 ERROR [stderr] (Thread-116) 	at java.lang.Thread.run(Thread.java:745)
      

      To reproduce this issue download zip with prepared EAP 7.0.0.ER4 servers:

      scp jbossqa@10.40.4.81:/home/jbossqa/tmp/oome-servers.zip .  # check password in private comment
      unzip oome-servers.zip
      cd server1/jboss-eap/bin
      sh standalone.sh -c standalone-full-ha.xml
      # in another console go to server2 and start it
      cd server2/jboss-eap/bin
      sh standalone.sh -c standalone-full-ha.xml -Djboss.socket.binding.port-offset=1000 
      

      If OOME does not appear then cause failover and then failback multiple times by shutting down server1 and starting it again.

      Customer impact:
      During synchronization live with backup, OOME exception occurs. It causes that synchronization fails and later failover/failback will not happen. It not only breaks HA but also crashes former live server which can no longer serve JMS clients. This will lead to unavailability of service in production.

      Attachments

        Issue Links

          Activity

            People

              rh-ee-ataylor Andy Taylor
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: