Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-2673

AMQ222109: Timed out waiting for write lock on consumer

    XMLWordPrintable

Details

    • +
    • Verified in a release

    Description

      The broker went down in production under high load.

      At 11:23 the master under high load started paging and soon after went unresponsive:

      2019-07-10 11:23:25,338 WARN  [org.apache.activemq.artemis.core.server] AMQ222038: Starting paging on address 'com_ig_trade_v0_legacy_order_history'; size is currently: 104,858,652 bytes; max-size-bytes: 104,857,600; global-size-bytes: 201,240,334
      2019-07-10 11:57:45,862 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: syscall:read(..) failed: Connection reset by peer [code=GENERIC_EXCEPTION]
      2019-07-10 12:16:42,962 WARN  [org.apache.activemq.artemis.core.server] AMQ222109: Timed out waiting for write lock on consumer. Check the Thread dump
      2019-07-10 12:16:44,527 WARN  [org.apache.activemq.artemis.core.server] AMQ222199: Thread dump: ...
      

      Then, the slave became live (split-brain) and started paging and crashed as well:

      2019-07-10 12:16:12,908 WARN  [org.apache.activemq.artemis.core.client] AMQ212037: Connection failure has been detected: AMQ219015: The connection was disconnected because of server shutdown [code=DISCONNECTED]
      2019-07-10 12:16:12,937 INFO  [org.apache.activemq.artemis.core.server] AMQ221071: Failing over based on quorum vote results.
      2019-07-10 12:16:16,599 INFO  [org.apache.activemq.artemis.core.server] AMQ221007: Server is now live
      2019-07-10 12:16:20,116 WARN  [org.apache.activemq.artemis.core.server] AMQ222038: Starting paging on address 'com_ig_trade_v0_close_out_monitor_account_state'; size is currently: 104,860,066 bytes; max-size-bytes: 104,857,600; global-size-bytes: 221,121,610
      2019-07-10 12:17:07,971 WARN  [org.apache.activemq.artemis.core.server] AMQ222109: Timed out waiting for write lock on consumer. Check the Thread dump
      2019-07-10 12:17:11,759 WARN  [org.apache.activemq.artemis.core.server] AMQ222199: Thread dump: ...
      

      Looking at the master's thread dump produced soon after paging started, there seems to be a deadlock between thread Id=685613 (WATING) thread Id=2497666 (BLOCKED) which in turns is blocking other 161 threads thus causing the failover. This deadlock may be caused by the lock on method PagingStoreImpl.isPaging [1] that has been removed in Artemis 2.7 (AMQ 7.3) among other paging improvements [2]. No reproducer available at the moment. Other improvements on the paging system are present the upcoming 7.4 release, so it could be an upgrade worth taking.

      "Thread-18796 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@7dcf96dc)" Id=2497666 BLOCKED on org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator@4b31ac0e owned by "Thread-23 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@3fb8a940)" Id=685613
      at org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.hasNext(PageSubscriptionImpl.java:1332)
      -  blocked on org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator@4b31ac0e
      at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkDepage(QueueImpl.java:2417)
      at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliverAsync(QueueImpl.java:806)
      at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.deliverIfNecessary(PageCursorProviderImpl.java:614)
      at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl.cleanup(PageCursorProviderImpl.java:367)
      -  locked org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl@58cba2dd
      at org.apache.activemq.artemis.core.paging.cursor.impl.PageCursorProviderImpl$1.run(PageCursorProviderImpl.java:290)
      at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
      at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$4/1107217291.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
      Number of locked synchronizers = 2
      - java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc0e8d9
      - java.util.concurrent.ThreadPoolExecutor$Worker@52572922
      
      "Thread-23 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$5@3fb8a940)" Id=685613 WAITING on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc0e8d9 owned by "Thread-18796 (ActiveMQ-IO-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$6@7dcf96dc)" Id=2497666
      at sun.misc.Unsafe.park(Native Method)
      -  waiting on java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync@2fc0e8d9
      at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:967)
      at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1283)
      at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:727)
      at org.apache.activemq.artemis.core.paging.impl.PagingStoreImpl.isPaging(PagingStoreImpl.java:278)
      at org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator.hasNext(PageSubscriptionImpl.java:1336)
      -  locked org.apache.activemq.artemis.core.paging.cursor.impl.PageSubscriptionImpl$CursorIterator@4b31ac0e
      at org.apache.activemq.artemis.core.server.impl.QueueImpl.checkDepage(QueueImpl.java:2417)
      at org.apache.activemq.artemis.core.server.impl.QueueImpl.deliverAsync(QueueImpl.java:806)
      at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.forceDelivery(ServerConsumerImpl.java:1093)
      at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.promptDelivery(ServerConsumerImpl.java:1085)
      at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.forceDelivery(ServerConsumerImpl.java:591)
      -  locked org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl@184f2bb9
      at org.apache.activemq.artemis.core.server.impl.ServerConsumerImpl.forceDelivery(ServerConsumerImpl.java:578)
      at org.apache.activemq.artemis.core.server.impl.ServerSessionImpl.forceConsumerDelivery(ServerSessionImpl.java:858)
      at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.slowPacketHandler(ServerSessionPacketHandler.java:590)
      at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler.onMessagePacket(ServerSessionPacketHandler.java:287)
      at org.apache.activemq.artemis.core.protocol.core.ServerSessionPacketHandler$$Lambda$65/1826790006.onMessage(Unknown Source)
      at org.apache.activemq.artemis.utils.actors.Actor.doTask(Actor.java:33)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$4/1107217291.run(Unknown Source)
      at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42)
      at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:66)
      at org.apache.activemq.artemis.utils.actors.ProcessorBase$$Lambda$4/1107217291.run(Unknown Source)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
      at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
      Number of locked synchronizers = 1
      - java.util.concurrent.ThreadPoolExecutor$Worker@3af247e0
      

      [1] https://github.com/apache/activemq-artemis/blob/2.6.4/artemis-server/src/main/java/org/apache/activemq/artemis/core/paging/impl/PagingStoreImpl.java#L278
      [2] https://issues.apache.org/jira/browse/ARTEMIS-2216

      Attachments

        1. deadlock-suspect.txt
          5 kB
        2. master-thread-dump.txt
          1.31 MB
        3. slave-thread-dump.txt
          355 kB

        Activity

          People

            fnigro Francesco Nigro
            rhn-support-fvaleri Federico Valeri
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: