Uploaded image for project: 'AMQ Broker'
  1. AMQ Broker
  2. ENTMQBR-7996

AMQ 7 with JDBC persistence: broker fails owing to huge number of 'type 36' records in the messages table

XMLWordPrintable

    • Important
    • Customer Escalated

      AMQ 7.10.2 is configured with JDBC persistence. Scheduled redelivery is enabled in the broker (this is a key facet of the problem).

      Applications are producing constantly to the broker, but a consuming application is repeatedly rejecting messages it receives. That is, it either rolls back a transaction, or negatively acknowledges the message.

      Eventually these rejected messages end up on the dead letter queue, which is the expected behaviour. However, in the time between the message first being produced, and the time it ends up on the DLQ, odd things happen in the messages table in the database.

      Because the rejected messages have to be scheduled for redelivery, a row is added to the messages table with the `userrecordtype` column set to `36`. If one of these records were created for each message, or even one for each redelivery attempt, this would not be a problem. However, with just the right combination of redelivery delay, number of concurrent consumers, and the time the application waits before it rejects the message, we can see hundreds of these 'type 36' records for each message delivered.

      So long as the producers keep producing, and the consumers keep rejecting messages, these records grow without limit. Eventually, we end up with the messages table containing tens of millions of rows, and then the broker fails because of database timeouts.

      Worse, the broker will not start again, because it takes days for it to process the tens of millions of rows in the table, and operations keep timing out.

      To reproduce this behaviour we need:

      • AMQ 7.10.x, configured with a short-ish redelivery delay. The shorter it is, the less time it takes to see the problem.
      • A relational database. The problem has been seen with Oracle and Postgres.
      • A tool to monitor the messages table in the database (sqlplus, Squirrel...)
      • An application that consumes messages, and rejects them in some way (e.g., defines client acknowledgement and does not commit)
      • Something to put messages onto the broker (e.g., `artemis producer`)

      If the test system is fast, it will be necessary to slow down the consumer application in order to see the problem. The problem is only severe – and easy to see – if messages are produced more quickly than they can be routed to the DLQ. Once the messages are on the DLQ, all the related 'type 36' records are removed; it's the combination of fast producer and a slow-ish consumer that creates the problem.

       

       

       

       

       

       

       

            csuconic@redhat.com Clebert Suconic
            rhn-support-kboone Kevin Boone
            Samuel Gajdos Samuel Gajdos
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: