Uploaded image for project: 'WildFly'
  1. WildFly
  2. WFLY-11404

Artemis throws Critical IO Error if new journal file is not created in 5 seconds

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Done
    • Icon: Critical Critical
    • 17.0.0.Alpha1, 17.0.0.Final
    • 15.0.0.Beta1
    • JMS
    • None

      I can see in our CI jobs that Artemis sometimes stops because of error [1]. I looked at the code [2] where the exception is thrown and I think it could be improved a bit.

      Customer Impact: If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover.

      First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.

      Second thing is that when openedFiles.poll() returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.

      [1]

      09:45:07,418 WARN  [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened]
              at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111]
              at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
      

      [2]

      public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException {
            if (logger.isTraceEnabled()) {
               logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size());
            }
      
            if (openFilesExecutor == null) {
               pushOpenRunnable.run();
            } else {
               openFilesExecutor.execute(pushOpenRunnable);
            }
      
            JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS);
            if (nextFile == null) {
               fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null);
               // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward
               // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver
               fileFactory.activateBuffer(journal.getCurrentFile().getFile());
               throw ActiveMQJournalBundle.BUNDLE.fileNotOpened();
            }
      
            if (logger.isTraceEnabled()) {
               logger.trace("Returning file " + nextFile);
            }
      
            return nextFile;
         }
      

              thofman Tomas Hofman
              thofman Tomas Hofman
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: