Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2134

Moving large messages during failback in replicated topology on Windows may fail

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • AMQ 7.2.1.GA
    • broker-core
    • None
    • Hide

      (in cygwin)

      git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git

      cd eap-tests-hornetq/scripts/

      groovy -DEAP_VERSION=7.1.0.CR2 PrepareServers7.groovy

        1. You will need to update path manually this is jus an example, do not use $PWD, use native windows path
          export JBOSS_HOME_1=I:\\eap-tests-hornetq\\scripts\\server1
          jboss-eap
          export JBOSS_HOME_2=I:\\meap-tests-hornetq\\scripts\\server2
          jboss-eap
          export JBOSS_HOME_3=I:\\eap-tests-hornetq\\scripts\\server3
          jboss-eap
          export JBOSS_HOME_4=I:\\eap-tests-hornetq\\scripts\\server4
          jboss-eap

      cd ../jboss-hornetq-testsuite/

      mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testMultipleFailoverWithQueueKill -DfailIfNoTests=false -Deap=7x | tee log

      Show
      (in cygwin) git clone git://git.app.eng.bos.redhat.com/jbossqe/eap-tests-hornetq.git cd eap-tests-hornetq/scripts/ groovy -DEAP_VERSION=7.1.0.CR2 PrepareServers7.groovy You will need to update path manually this is jus an example, do not use $PWD, use native windows path export JBOSS_HOME_1=I:\\eap-tests-hornetq\\scripts\\server1 jboss-eap export JBOSS_HOME_2=I:\\meap-tests-hornetq\\scripts\\server2 jboss-eap export JBOSS_HOME_3=I:\\eap-tests-hornetq\\scripts\\server3 jboss-eap export JBOSS_HOME_4=I:\\eap-tests-hornetq\\scripts\\server4 jboss-eap cd ../jboss-hornetq-testsuite/ mvn clean test -Dtest=ReplicatedDedicatedFailoverTestCase#testMultipleFailoverWithQueueKill -DfailIfNoTests=false -Deap=7x | tee log

      Standard multiple failover/failback scenario
      Two servers are in Replicated HA topology server 1 is live server2 is a backup.
      clients are sending and receiving large messages while 50 failovers with failback are done via server1 kill and start.

      After some time restarting of the backup server after failback fails because of error while moving LM to the oldreplica directory.

      we can see folowing error:

      12:22:04,769 ERROR [org.apache.activemq.artemis.core.server] (AMQ119000: Activation for server ActiveMQServerImpl::serverUUID=null) AMQ224000: Failure in initialisation: java.nio.file.FileSystemException: I:\messaging-testsuite\scripts\server2\jboss-eap\standalone\data\..\..\..\..\hornetq-journal-B\largemessages\17179887281.msg -> I:\messaging-testsuite\scripts\server2\jboss-eap\standalone\data\..\..\..\..\hornetq-journal-B\largemessages\oldreplica.5\17179887281.msg: The process cannot access the file because it is being used by another process.
      
      	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:86) [rt.jar:1.8.0_144]
      	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:97) [rt.jar:1.8.0_144]
      	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:387) [rt.jar:1.8.0_144]
      	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:287) [rt.jar:1.8.0_144]
      	at java.nio.file.Files.move(Files.java:1395) [rt.jar:1.8.0_144]
      	at org.apache.activemq.artemis.core.server.files.FileMoveManager.doMove(FileMoveManager.java:117)
      	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.moveServerData(ActiveMQServerImpl.java:2438)
      	at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:111)
      	at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2496)
      

      Before this error there is an warning about journal state when loading the message:

      12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Create file for message: LargeServerMessage[messageID=17179887281,durable=true,userID=727f7005-a2a4-11e7-ab18-080027c3f38b,priority=4, timestamp=Tue Sep 26 12:21:23 CEST 2017,expiration=0, durable=true, address=jms.queue.testQueue0,properties=TypedProperties[__AMQ_CID=99f3c21d-a2a2-11e7-ab18-080027c3f38b,count=3954,counter=11876,_AMQ_DUPL_ID=88996bf5-6cd3-45a9-bc9a-1d8d88ad8669,_AMQ_LARGE_SIZE=400005]]@1130072951
      12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Creating file for LM with messageID17179887281
      12:22:04,598 TRACE [org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) OPEN_FILE openFile 17179887281.msg
      12:22:04,598 TRACE [org.apache.activemq.artemis.core.replication.ReplicatedJournal] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) Append record id = 17179887282 recordType = 29
      12:22:04,598 WARN  [org.apache.activemq.artemis.core.server] (Thread-3 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@31355de6)) AMQ222053: Error on copying large message LargeServerMessage[messageID=17179886615,durable=true,userID=727f7005-a2a4-11e7-ab18-080027c3f38b,priority=4, timestamp=Tue Sep 26 12:21:23 CEST 2017,expiration=0, durable=true, address=jms.queue.testQueue0,properties=TypedProperties[__AMQ_CID=99f3c21d-a2a2-11e7-ab18-080027c3f38b,count=3954,counter=11876,_AMQ_DUPL_ID=88996bf5-6cd3-45a9-bc9a-1d8d88ad8669,_AMQ_LARGE_SIZE=400005]]@2137629247 for DLA or Expiry: java.lang.IllegalStateException: Journal must be in state=LOADED, was [STOPPED]
      	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkJournalIsLoaded(JournalImpl.java:968) [artemis-journal-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendAddRecord(JournalImpl.java:729) [artemis-journal-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at org.apache.activemq.artemis.core.replication.ReplicatedJournal.appendAddRecord(ReplicatedJournal.java:115) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.storePendingLargeMessage(JournalStorageManager.java:391) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.createLargeMessage(JournalStorageManager.java:486) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.copy(LargeServerMessageImpl.java:277) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.redistribute(PostOfficeImpl.java:761) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.server.cluster.impl.Redistributor.handle(Redistributor.java:142) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.handle(QueueImpl.java:2603) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliver(QueueImpl.java:2000) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl.access$1700(QueueImpl.java:101) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.core.server.impl.QueueImpl$DeliverRunner.run(QueueImpl.java:2789) [artemis-server-1.5.5.008-redhat-1.jar:1.5.5.jbossorg-008]
      	at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:122) [artemis-commons-1.5.5.008-redhat-1.jar:1.5.5.008-redhat-1]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_144]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_144]
      	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_144]
      

      From what I was able to debug, the root cause of this issue is that

      LargeServerMessageImpl.validate() opens the file and
      when IllegalStateException (AMQ222053: Error on copying large message) is thrown in
      LargeServerMessageImpl.copy(), the file is never closed.

      This is not a problem on Linux machines as policy for moving opened files is more benevolent.

              mtaylor1@redhat.com Martyn Taylor (Inactive)
              mnovak1@redhat.com Miroslav Novak
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: