Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: JBoss A-MQ 6.3
Affects Version/s: 6.1.1
Component/s: broker
Labels:
None

Fix Build:
ER2
Steps to Reproduce:
Hide

Steps to reproduce:

lower the cleanup interval to 10 sec

lower the pool size to 2

send/receive messages from transacted and non-transacted consumers (we actually have xa and local transaction consumers)
Show
Steps to reproduce: lower the cleanup interval to 10 sec lower the pool size to 2 send/receive messages from transacted and non-transacted consumers (we actually have xa and local transaction consumers)

SFDC Cases Counter:
SFDC Cases Links:

Description

We were able to reproduce the problem using a small pool size, we could look into the problem more deeply.

There were two types of blocked threads:
1) borrow connection
2) acquire lock

Looking at the code we were able to identify the #2 lock, which is makes the transactions wait:

DefaultJDBCAdapter.cleanupExclusiveLock

The lock has 3 states:
1) open - read lock can be acquired freely
2) half-closed

outstanding read locks needs to finish first
write lock pending (blocking until read is done)
no new read lock handed out (new requests blocking)
3) closed - until write lock(s) finish new readers blocks

The cleanupExclusiveLock protects the internal structure of the database from changing in an unexpected way. It is used by the DB cleanup (doDeleteOldMessages) in WRITE mode. All other transactions require READ mode.

The actual deadlock is caused by this lock, which blocks transactions to finish and others are blocked by the lack of free connection.

For the exact explanation please check the overview picture attached.

"Lock" represents the rw lock
"Conn" is the connection pool.
"Tx1-3" represent the transactions.

There are two types of transactions:
tx1 - "not transactional" - they first acquire the read lock and then getConnection.
(they need to come first, in "open mode")

tx3 - "transactional" - they try to acquire a the read lock whilst owning a connection!
(they must come in "half-closed" mode.)

In case of sending/receiving in transaction (TX3), the connection is acquired on BEGIN, so there is a connection already in the TransactionContext.
(eg. "ActiveMQ Transport: tcp:///xx.xx.xx.xx:41767@14016") in jstack.xxxxx.20160531214821.AMQ.log

In case of sending/receiving using local transaction (TX1), the connection is first acquired inside the lock.
(eg. "ActiveMQ Transport: tcp:///xx.xx.xx.xx:27378@14016")

The problem occurs, if TX1 type operations cannot acquire a connection, thus blocking.
Tx2 (cleanup) is blocked, due to Tx1.
TX3 keeps getting connections first and wait for the clean-up to finish whilst owning the connection.

If tx3 quick enough it will make the cycle complete, then this situation will exhaust the database pool, so the server will loose the master lock.

Regarding the fix, we don't think that getting the connection before the clean-up would help, because in our cases the process was not able to enter into the write locked area.

You will need to move all getConnection() calls out of the read-locks or only execute the statement within the read-lock.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

amqdeadlock.jpg
17 kB
2016/07/14 9:12 AM

Issue Links

relates to: AMQ-6370 Loading...

Activity

People

Assignee:: Gary Tully

Reporter:: Joe Luo

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2016/07/14 9:11 AM

Updated:: 2021/10/24 6:09 AM

Resolved:: 2016/07/26 5:08 AM