Standard multiple failover/failback scenario
Two servers are in Replicated HA topology server 1 is live server2 is a backup.
clients are sending and receiving large messages while 50 failovers with failback are done via server1 kill and start.
After some time restarting of the backup server after failback fails because of error while moving LM to the oldreplica directory.
we can see folowing error:
12:22:04,769 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ224000: Failure in initialisation: java.nio.file.FileSystemException: I:\messaging-testsuite\scripts\server2\jboss-eap\standalone\data\..\..\..\..\hornetq-journal-B\largemessages\17179887281.msg -> I:\messaging-testsuite\scripts\server2\jboss-eap\standalone\data\..\..\..\..\hornetq-journal-B\largemessages\oldreplica.5\17179887281.msg: The process cannot access the file because it is being used by another process. at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) [rt.jar:1.8.0_144] at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) [rt.jar:1.8.0_144] at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) [rt.jar:1.8.0_144] at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) [rt.jar:1.8.0_144] at java.nio.file.Files.move(Files.java:1395) [rt.jar:1.8.0_144] at org.apache.activemq.artemis.core.server.files.FileMoveManager.doMove(FileMoveManager.java:117) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.moveServerData(ActiveMQServerImpl.java:2438) at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:111) at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
Before this error there is an warning about journal state when loading the message:
12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Create file for message: LargeServerMessage[messageID=17179887281,durable=true,userID=727f7005-a2a4-11e7-ab18-080027c3f38b,priority=4, timestamp=Tue Sep 26 12:21:23 CEST 2017,expiration=0, durable=true, address=jms.queue.testQueue0,properties=TypedProperties[__AMQ_CID=99f3c21d-a2a2-11e7-ab18-080027c3f38b,count=3954,counter=11876,_AMQ_DUPL_ID=88996bf5-6cd3-45a9-bc9a-1d8d88ad8669,_AMQ_LARGE_SIZE=400005]]@1130072951 12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Creating file for LM with messageID17179887281 12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) OPEN_FILE openFile 17179887281.msg 12:22:04,598 TRACE [org.apache.activemq.artemis.core.replication.ReplicatedJournal] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Append record id = 17179887282 recordType = 29 12:22:04,598 WARN [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) AMQ222053: Error on copying large message LargeServerMessage[messageID=17179886615,durable=true,userID=727f7005-a2a4-11e7-ab18-080027c3f38b,priority=4, timestamp=Tue Sep 26 12:21:23 CEST 2017,expiration=0, durable=true, address=jms.queue.testQueue0,properties=TypedProperties[__AMQ_CID=99f3c21d-a2a2-11e7-ab18-080027c3f38b,count=3954,counter=11876,_AMQ_DUPL_ID=88996bf5-6cd3-45a9-bc9a-1d8d88ad8669,_AMQ_LARGE_SIZE=400005]]@2137629247 for DLA or Expiry: java.lang.IllegalStateException: Journal must be in state=LOADED, was [STOPPED] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:968) [artemis-journal-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendAddRecord(JournalImpl.java:729) [artemis-journal-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at org.apache.activemq.artemis.core.replication.ReplicatedJournal.appendAddRecord(ReplicatedJournal.java:115) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.storePendingLargeMessage(JournalStorageManager.java:391) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.createLargeMessage(JournalStorageManager.java:486) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.copy(LargeServerMessageImpl.java:277) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.redistribute(PostOfficeImpl.java:761) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.server.cluster.impl.Redistributor.handle(Redistributor.java:142) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2603) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2000) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$1700(QueueImpl.java:101) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2789) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144] at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144]
From what I was able to debug, the root cause of this issue is that
LargeServerMessageImpl.validate() opens the file and
when IllegalStateException (AMQ222053: Error on copying large message) is thrown in
LargeServerMessageImpl.copy(), the file is never closed.
This is not a problem on Linux machines as policy for moving opened files is more benevolent.
- is cloned by
-
ENTMQBR-2134 Moving large messages during failback in replicated topology on Windows may fail
- Closed