-
Enhancement
-
Resolution: Done
-
Critical
-
15.0.0.Beta1
-
None
I can see in our CI jobs that Artemis sometimes stops because of error [1]. I looked at the code [2] where the exception is thrown and I think it could be improved a bit.
Customer Impact: If Artemis journal is located on slower file system (like NFS) then if server is under load then it might crash. This will lead to unavailability of service. Server must be restarted to recover.
First thing I noticed is that the 5 seconds timeout is not configurable. I agree that it should be enough in most cases but if someone would want to use NFS for Artemis journal and he doesn't care about performance, we should able him to tune this value. Additionally the timeout doesn't reflect size of journal files.
Second thing is that when openedFiles.poll() returns null we can't be sure whether it is problem of exhausted disc or exhausted CPU. I think there should be added some kind of latch which would wait until pushOpenRunnable is executed. It will make sure that there is issue with IO operations and it was not caused by exhausted CPU.
[1]
09:45:07,418 WARN [org.apache.activemq.artemis.core.server] (Thread-10 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$4@2646099c-962838060)) AMQ222010: Critical IO Error, shutting down the server. file=NULL, message=unable to open : ActiveMQIOErrorException[errorType=IO_ERROR message=AMQ149003: File not opened] at org.apache.activemq.artemis.core.journal.impl.JournalFilesRepository.openFile(JournalFilesRepository.java:423) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.moveNextFile(JournalImpl.java:2885) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.switchFileIfNecessary(JournalImpl.java:2842) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2568) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$200(JournalImpl.java:87) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.core.journal.impl.JournalImpl$4.run(JournalImpl.java:889) [artemis-journal-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at org.apache.activemq.artemis.utils.OrderedExecutorFactory$OrderedExecutor$ExecutorTask.run(OrderedExecutorFactory.java:101) [artemis-commons-1.5.3.002-redhat-1.jar:1.5.3.002-redhat-1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [rt.jar:1.8.0_111] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [rt.jar:1.8.0_111] at java.lang.Thread.run(Thread.java:745) [rt.jar:1.8.0_111]
[2]
public JournalFile openFile() throws InterruptedException, ActiveMQIOErrorException { if (logger.isTraceEnabled()) { logger.trace("enqueueOpenFile with openedFiles.size=" + openedFiles.size()); } if (openFilesExecutor == null) { pushOpenRunnable.run(); } else { openFilesExecutor.execute(pushOpenRunnable); } JournalFile nextFile = openedFiles.poll(5, TimeUnit.SECONDS); if (nextFile == null) { fileFactory.onIOError(ActiveMQJournalBundle.BUNDLE.fileNotOpened(), "unable to open ", null); // We need to reconnect the current file with the timed buffer as we were not able to roll the file forward // If you don't do this you will get a NPE in TimedBuffer::checkSize where it uses the bufferobserver fileFactory.activateBuffer(journal.getCurrentFile().getFile()); throw ActiveMQJournalBundle.BUNDLE.fileNotOpened(); } if (logger.isTraceEnabled()) { logger.trace("Returning file " + nextFile); } return nextFile; }
- clones
-
JBEAP-9223 (7.2.z) Artemis throws Critical IO Error if new journal file is not created in 5 seconds
- Closed
- is related to
-
HAL-1604 Make journal-file-open-timeout attribute configurable for messaging server resource
- Resolved
-
ENTMQBR-2195 Create upstream test on DEFAULT_JOURNAL_FILE_OPEN_TIMEOUT value
- Closed
- relates to
-
ARTEMIS-1237 Loading...