-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
None
The JournalImpl#checkKnownRecordID() implementation contains following code:
final SimpleFuture<Boolean> known = new SimpleFutureImpl<>(); // retry on the append thread. maybe the appender thread is not keeping up. appendExecutor.execute(new Runnable() { @Override public void run() { journalLock.readLock().lock(); try { known.set(records.containsKey(id) || pendingRecords.contains(id) || (compactor != null && compactor.containsRecord(id))); } finally { journalLock.readLock().unlock(); } } }); if (!known.get()) { ... }
If the code in the Runnable fails with exception before the known future value is set, the main thread would be left in the WAITING state forever. Exception handling should be added that would cancel the future in case of exception.
We've observed cases where following threads were left hanging, while no other threads operating inside JournalImpl were present. I believe that JournalImpl#checkKnownRecordID() implementation may be responsible for that:
"Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@423fe5c3)" #1078 prio=5 os_prio=64 tid=0x000000011c34a000 nid=0x4eb waiting on condition [0xfffffffabe9ad000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0xfffffffbe73c29e8> (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) at org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1080) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:950) at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:361) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1390) - locked <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1336) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:980) at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:871) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2045) - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl) at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1989) - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:1034) - locked <0xfffffffb1962b900> (a java.lang.Object) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:312) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:285) at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$651/2097400985.onMessage(Unknown Source) at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66) at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java) at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) Locked ownable synchronizers: - <0xfffffffba1800ca0> (a java.util.concurrent.ThreadPoolExecutor$Worker) "Thread-82 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@3bde9e44)" #2130 prio=5 os_prio=64 tid=0x000000017b6df800 nid=0x907 waiting for monitor entry [0xffffffff045de000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:178) - waiting to lock <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl) at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59) at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25) at org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2792) at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91) at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850)
- incorporates
-
JBEAP-20765 (7.4.z) ARTEMIS-3037 JournalImpl#checkKnownRecordID() implementation can leave a thread hanging in WAITING state
- Closed
- is cloned by
-
ENTMQBR-4403 [LTS] ARTEMIS-3037 JournalImpl#checkKnownRecordID() implementation can leave a thread hanging in WAITING state
- Closed