Uploaded image for project: 'JBoss A-MQ'
  1. JBoss A-MQ
  2. ENTMQ-2178

Messages Fail to Page In from JDBC Datasource On Broker Restart

XMLWordPrintable

    • +
    • Hide

      Only a full JVM restart or failover seems to work around the issue.

      Show
      Only a full JVM restart or failover seems to work around the issue.
    • Hide

      Details to come, but in a nutshell:

      1. Configure a broker with a JDBC datasource (we used DBCP pooling)
      2. Introduce some latency in the datasource (accomplished here via a stored procedure to introduce delays and triggers to fire the procedure on inserts or updates)
      3. Start a producer and consumer on a queue
      4. Enable the triggers for the delay procedure to force a restart
      5. Disable triggers and watch for lingering queue counts after consumers reconnect and consume remaining message.

      Show
      Details to come, but in a nutshell: 1. Configure a broker with a JDBC datasource (we used DBCP pooling) 2. Introduce some latency in the datasource (accomplished here via a stored procedure to introduce delays and triggers to fire the procedure on inserts or updates) 3. Start a producer and consumer on a queue 4. Enable the triggers for the delay procedure to force a restart 5. Disable triggers and watch for lingering queue counts after consumers reconnect and consume remaining message.

      Upon internal broker restart (triggered by JDBC IOException), we see that not all messages are successfully paged into the queue. The queue count accurately reflects the count for the destination in the JDBC source; however, none of the messages are browseable and the messages do not appear to be loaded into the cursor.

      In this instance, we had 1 "stuck" message that was queryable in the database, showed in the queue count, but was not consumed by the consumer and was not browseable via Hawtio:

      2018-02-16 15:08:49,158 | DEBUG | ce[amq02] Task-4 | Queue                            | che.activemq.broker.region.Queue 1926 | 162 - org.apache.activemq.activemq-osgi - 5.11.0.redhat-630310 | queue://TEST.INBOUND_QUEUE, subscriptions=1, memory=0%, size=1, pending=0 toPageIn: 1, force:false, Inflight: 0, pagedInMessages.size 0, pagedInPendingDispatch.size 0, enqueueCount: 19, dequeueCount: 23, memUsage:0, maxPageSize:200
      

      We could send and receive other messages, but the message in question remained unbrowseable, but still counting against the queue depth. This issue occurred after an IOException restarted the broker, with the same broker obtaining the lock following the restart. We were able to reproduce the issue several times and upon subsequent internal restarts, some of the "stuck" messages got paged in and consumed, but in the case above one orphaned message remained.

      Restarting consumers had no effect, but a JVM restart on the broker resulted in the messages all being paged in and consumed.

      In the reproducer environment, there is a single network connection to another broker, but the queue depth for the same queue reported there was "0."

        1. 1000-msg-one-master-slave-1.tar.bz2
          25.96 MB
          Duane Hawkins
        2. 1000-msg-one-master-slave-2.tar.bz2
          26.97 MB
          Duane Hawkins
        3. 200-msgs-catch-npe-1.tar.bz2
          18.58 MB
          Duane Hawkins
        4. 200-msgs-catch-npe-2.tar.bz2
          22.62 MB
          Duane Hawkins
        5. 200msgs-node1.tar.bz2
          24.72 MB
          Duane Hawkins
        6. 200-msgs-no-delete-delay-1.tar.bz2
          18.49 MB
          Duane Hawkins
        7. 200msgs-xtra-instrumentation-1.tar.bz2
          22.47 MB
          Duane Hawkins
        8. 200msgs-xtra-instrumentation-2.tar.bz2
          23.07 MB
          Duane Hawkins
        9. count-off-by-1-200msgs.tar.bz2
          20.28 MB
          Duane Hawkins
        10. fixed-delays.tar.bz2
          28.95 MB
          Duane Hawkins
        11. LOG.txt
          319 kB
          Gary Tully
        12. multinode-amq1.tar.bz2
          2.27 MB
          Duane Hawkins
        13. multinode-amq2.tar.bz2
          5.33 MB
          Duane Hawkins
        14. node1-restarts.tar.bz2
          6.26 MB
          Duane Hawkins
        15. node2-restarts.tar.bz2
          2.15 MB
          Duane Hawkins
        16. reproducer-jvm-restarts-no-heap.tar.bz2
          2.71 MB
          Duane Hawkins
        17. reproducer-no-jvm-restarts-wo-new-instrumentation.tar.bz2
          24.01 MB
          Duane Hawkins
        18. reproducer-xtra-instrumentation.tar.bz2
          19.04 MB
          Duane Hawkins

              gtully@redhat.com Gary Tully
              rhn-support-dhawkins Duane Hawkins
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: