Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Major
Fix Version/s: None
Affects Version/s: AMQ 7.7.0.GA
Component/s: None
Labels:
- NO-BACKPORT-NEEDED

Blocked:
False
Ready:
False
GSS Priority:
Release Note Text:
Undefined
Upstream Jira:
https://issues.apache.org/jira/browse/ARTEMIS-3025, https://issues.apache.org/jira/browse/ARTEMIS-3021,https://issues.apache.org/jira/browse/ARTEMIS-3083
Steps to Reproduce:
Hide

Set up two AMQ 7,7 brokers in a in a non-replicated, fully-connected mesh. To make the problem easier to reproduce, I set -Xmx for both brokers to 512Mb – just so they run out of heap quickly.

Here is the test topology:

+-----------+ +------------+ publisher client-->| upstream |--> | downstream | --> temporary subscriber | broker | | broker | +-----------+ +------------+

I'm using the term "downstream" for the broker that the subscriber will connect to, and "upstream" for the broker to which the publisher will connect. Of course, this is a symmetric cluster, so the roles are interchangeable.

Use JMS consumer to connect a durable consumer to downstream, on a specific topic, with a specific client ID.

Use a JMS publisher to publish a message to the same topic, on upstream. Verify that the consumer receives the message.

Disconnect the consumer.

Use a JMS publisher to publish, say, unlimited 50kB messages to the same topic. The larger the messages, the quicker the problem reproduces.

After a few hundred kB of messages, downstream fails with an OOM error with no stack backtrace. Thereafter, it is effectively dead.

Upstream starts logging messages saying it can't connect to downstream. This is expected.

Restart downstream – this requires a 'kill -9' in my tests.

Almost immediately, upstream fails with an OOM – again no backtrace. Upstream is now effectively dead, and has to be restarted.

Note that paging does start on downstream – this can be seen in the log. A heap dump from downstream after failure shows the heap completely full of message-related objects – the exact class depends on the wire protocol in use.

I can reproduce this problem with AMQP and Core clients. I have observed other problems with durable consumers and AMQP, but this one does not seem to depend on protocol.
Show
Set up two AMQ 7,7 brokers in a in a non-replicated, fully-connected mesh. To make the problem easier to reproduce, I set -Xmx for both brokers to 512Mb – just so they run out of heap quickly. Here is the test topology: +-----------+ +------------+ publisher client-->| upstream |--> | downstream | --> temporary subscriber | broker | | broker | +-----------+ +------------+ I'm using the term "downstream" for the broker that the subscriber will connect to, and "upstream" for the broker to which the publisher will connect. Of course, this is a symmetric cluster, so the roles are interchangeable. Use JMS consumer to connect a durable consumer to downstream, on a specific topic, with a specific client ID. Use a JMS publisher to publish a message to the same topic, on upstream. Verify that the consumer receives the message. Disconnect the consumer. Use a JMS publisher to publish, say, unlimited 50kB messages to the same topic. The larger the messages, the quicker the problem reproduces. After a few hundred kB of messages, downstream fails with an OOM error with no stack backtrace. Thereafter, it is effectively dead. Upstream starts logging messages saying it can't connect to downstream. This is expected. Restart downstream – this requires a 'kill -9' in my tests. Almost immediately, upstream fails with an OOM – again no backtrace. Upstream is now effectively dead, and has to be restarted. Note that paging does start on downstream – this can be seen in the log. A heap dump from downstream after failure shows the heap completely full of message-related objects – the exact class depends on the wire protocol in use. I can reproduce this problem with AMQP and Core clients. I have observed other problems with durable consumers and AMQP, but this one does not seem to depend on protocol.
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

In a fully-connected, symmetric mesh, connecting a durable topic subscriber to a particular broker causes messages to be sent to that broker thereafter, even if no client with the same client ID is connected. That in itself may be expected, but this behaviour causes message-related objects to build up in the heap of the broker that originally hosted the client. The build-up continues until either a client connects and consumes the messages, or the broker runs out of heap memory.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

heap_histogram.txt
226 kB
2020/10/22 4:39 AM
start.hprof.gz
13.57 MB
2020/11/11 6:03 AM

Assignee:: Gary Tully

Reporter:: Kevin Boone

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2020/10/15 10:00 AM

Updated:: 2024/03/25 6:51 PM

Resolved:: 2021/05/27 9:09 AM

Details

Description

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates