Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: AMQ 7.4.4.CR1
Affects Version/s: None
Component/s: None
Labels:
- long-term-support

GSS Priority:
QE Test Coverage:
+
Release Note Text:

Hide
Previously, if a broker in a live-backup group configured to use the replication high-availability (HA) policy experienced a very high throughput of AMQP messages (that is, a rate of 10,000 or more messages per second) message consumption might stall. In addition, the broker and clients might display errors such as "Could not complete operations" (broker) and "Remote did not respond to a drain request in time" (client). This issue is now resolved.

Show
Previously, if a broker in a live-backup group configured to use the replication high-availability (HA) policy experienced a very high throughput of AMQP messages (that is, a rate of 10,000 or more messages per second) message consumption might stall. In addition, the broker and clients might display errors such as "Could not complete operations" (broker) and "Remote did not respond to a drain request in time" (client). This issue is now resolved.
Release Note Status:
Documented as Resolved Issue
Target Release:

AMQ 7.4.4.GA
Upstream Jira:
ARTEMIS-2642
Verified:
Verified in a release
Git Pull Request:
https://github.com/rh-messaging/activemq-artemis/pull/331

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This problem occurs when AMQP clients publish/subscribe messages from the broker at rates in excess of 10,000 messages per second. It has been seen with QPid JMS and the .NET AMQP client, but never seen with Artemis Core JMS or OpenWire clients. We therefore think that the problem is related to the broker's AMQP support.

A variety of different failure modes can be reproduced, using a test case that publishes from twenty concurrent threads, and consumes from twenty concurrent threads, with consumers arranged into shared consumers in groups of five. It is not clear at this stage whether the subscription mode is significant or not. In addition, although the problem has so far only been seen in a master-slave, shared-nothing replication set-up, it isn't clear whether the topology is relevant or not.

All the failure modes have in common that message consumption stops, even though new messages are being published. There may, or may not, be exceptions in the broker or the client.

In some cases we see this message in the broker:

2020-03-10 00:15:33,115 WARN  [org.apache.activemq.artemis.core.server] AMQ222024: Could not complete operations on IO context OperationContextImpl [742496659] [minimalStore=0, storeLineUp=0, stored=0, minimalRe
plicated=1, replicationLineUp=1, replicated=0, paged=0, minimalPage=0, pageLineUp=0, errorCode=-1, errorMessage=null, executorsPending=0, executor=OrderedExecutor(tasks=[])]Task = TaskHolder [storeLined=0, repli
cationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=IOCallback(PageSubscriptionImpl) ]
Task = TaskHolder [storeLined=0, replicationLined=1, pageLined=0, task=org.apache.activemq.artemis.core.journal.impl.SimpleWaitIOCallback]

In some cases we see this exception in the client:

Caused by: org.apache.qpid.jms.provider.exceptions.ProviderOperationTimedOutException: Remote did not respond to a drain request in time
        at org.apache.qpid.jms.provider.amqp.AmqpConsumer.lambda$stop$1(AmqpConsumer.java:179) ~[qpid-jms-client-0.48.0.jar!/:na]
        at io.netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:127) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:416) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:331) ~[netty-transport-native-epoll-4.1.39.Final-linux-x86_64.jar!/:4.1.39.Final]
        at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:918) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.39.Final.jar!/:4.1.39.Final]

clones

ENTMQBR-3356 AMQP consumption stalls under during high message throughput

Closed

is related to

ENTMQBR-3432 OpenWire consumption stalls under during high message throughput

Closed

Assignee:: Clebert Suconic

Reporter:: Kevin Boone

Tester:: Tiago Bueno

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2020/04/06 4:16 PM

Updated:: 2020/12/21 10:12 AM

Resolved:: 2020/07/15 12:14 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates