Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: AMQ 7.11.1.GA
Affects Version/s: AMQ 7.10.2.GA
Component/s: None
Labels:
- CR1
- upstream-test-coverage

Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Target Release:

AMQ 7.11.1.GA
Upstream Jira:
https://issues.apache.org/jira/browse/ARTEMIS-4285
Intelligence Requested:
Market:

Severity:
Important
Customer Impact:

Customer Escalated

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

AMQ 7.10.2 is configured with JDBC persistence. Scheduled redelivery is enabled in the broker (this is a key facet of the problem).

Applications are producing constantly to the broker, but a consuming application is repeatedly rejecting messages it receives. That is, it either rolls back a transaction, or negatively acknowledges the message.

Eventually these rejected messages end up on the dead letter queue, which is the expected behaviour. However, in the time between the message first being produced, and the time it ends up on the DLQ, odd things happen in the messages table in the database.

Because the rejected messages have to be scheduled for redelivery, a row is added to the messages table with the `userrecordtype` column set to `36`. If one of these records were created for each message, or even one for each redelivery attempt, this would not be a problem. However, with just the right combination of redelivery delay, number of concurrent consumers, and the time the application waits before it rejects the message, we can see hundreds of these 'type 36' records for each message delivered.

So long as the producers keep producing, and the consumers keep rejecting messages, these records grow without limit. Eventually, we end up with the messages table containing tens of millions of rows, and then the broker fails because of database timeouts.

Worse, the broker will not start again, because it takes days for it to process the tens of millions of rows in the table, and operations keep timing out.

To reproduce this behaviour we need:

AMQ 7.10.x, configured with a short-ish redelivery delay. The shorter it is, the less time it takes to see the problem.

A relational database. The problem has been seen with Oracle and Postgres.

A tool to monitor the messages table in the database (sqlplus, Squirrel...)

An application that consumes messages, and rejects them in some way (e.g., defines client acknowledgement and does not commit)

Something to put messages onto the broker (e.g., `artemis producer`)

If the test system is fast, it will be necessary to slow down the consumer application in order to see the problem. The problem is only severe – and easy to see – if messages are produced more quickly than they can be routed to the DLQ. Once the messages are on the DLQ, all the related 'type 36' records are removed; it's the combination of fast producer and a slow-ish consumer that creates the problem.

Assignee:: Clebert Suconic

Reporter:: Kevin Boone (Inactive)

Tester:: Samuel Gajdos

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/04/27 10:02 AM

Updated:: 2024/07/25 2:16 PM

Resolved:: 2023/06/16 3:52 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates