Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-4403

[LTS] ARTEMIS-3037 JournalImpl#checkKnownRecordID() implementation can leave a thread hanging in WAITING state

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • journal
    • None

      The JournalImpl#checkKnownRecordID() implementation contains following code:

            final SimpleFuture<Boolean> known = new SimpleFutureImpl<>();
      
            // retry on the append thread. maybe the appender thread is not keeping up.
            appendExecutor.execute(new Runnable() {
               @Override
               public void run() {
                  journalLock.readLock().lock();
                  try {
      
                     known.set(records.containsKey(id)
                        || pendingRecords.contains(id)
                        || (compactor != null && compactor.containsRecord(id)));
                  } finally {
                     journalLock.readLock().unlock();
                  }
               }
            });
      
            if (!known.get()) {
                ...
            }
      

      If the code in the Runnable fails with exception before the known future value is set, the main thread would be left in the WAITING state forever. Exception handling should be added that would cancel the future in case of exception.

      We've observed cases where following threads were left hanging, while no other threads operating inside JournalImpl were present. I believe that JournalImpl#checkKnownRecordID() implementation may be responsible for that:

      "Thread-16 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@423fe5c3)" #1078 prio=5 os_prio=64 tid=0x000000011c34a000 nid=0x4eb waiting on condition [0xfffffffabe9ad000]
         java.lang.Thread.State: WAITING (parking)
              at sun.misc.Unsafe.park(Native Method)
              - parking to wait for  <0xfffffffbe73c29e8> (a java.util.concurrent.CountDownLatch$Sync)
              at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
              at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
              at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
              at org.apache.activemq.artemis.utils.SimpleFutureImpl.get(SimpleFutureImpl.java:62)
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.checkKnownRecordID(JournalImpl.java:1080)
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendDeleteRecord(JournalImpl.java:950)
              at org.apache.activemq.artemis.core.persistence.impl.journal.AbstractJournalStorageManager.confirmPendingLargeMessage(AbstractJournalStorageManager.java:361)
              at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.confirmLargeMessageSend(PostOfficeImpl.java:1390)
              - locked <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
              at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.processRoute(PostOfficeImpl.java:1336)
              at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:980)
              at org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl.route(PostOfficeImpl.java:871)
              at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:2045)
              - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
              at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.doSend(ServerSessionImpl.java:1989)
              - locked <0xfffffffb19447fb8> (a org.apache.activemq.artemis.core.server.impl.ServerSessionImpl)
              at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.sendContinuations(ServerSessionPacketHandler.java:1034)
              - locked <0xfffffffb1962b900> (a java.lang.Object)
              at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:312)
              at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:285)
              at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$651/2097400985.onMessage(Unknown Source)
              at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
              at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
              at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source)
              at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
              at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
              at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
              at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$413/494003142.run(Unknown Source)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java)
              at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
      
         Locked ownable synchronizers:
              - <0xfffffffba1800ca0> (a java.util.concurrent.ThreadPoolExecutor$Worker)
      
      "Thread-82 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$7@3bde9e44)" #2130 prio=5 os_prio=64 tid=0x000000017b6df800 nid=0x907 waiting for monitor entry [0xffffffff045de000]
         java.lang.Thread.State: BLOCKED (on object monitor)
              at org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl.getEncodeSize(LargeServerMessageImpl.java:178)
              - waiting to lock <0xfffffffbe73aa1b0> (a org.apache.activemq.artemis.core.persistence.impl.journal.LargeServerMessageImpl)
              at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:59)
              at org.apache.activemq.artemis.core.persistence.impl.journal.codec.LargeMessagePersister.getEncodeSize(LargeMessagePersister.java:25)
              at org.apache.activemq.artemis.core.journal.impl.dataformat.JournalAddRecord.getEncodeSize(JournalAddRecord.java:79)
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.appendRecord(JournalImpl.java:2792)
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl.access$100(JournalImpl.java:91)
              at org.apache.activemq.artemis.core.journal.impl.JournalImpl$1.run(JournalImpl.java:850)
      

              rhn-support-jbertram Justin Bertram
              thofman Tomas Hofman
              Tiago Bueno Tiago Bueno
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: