Loading...

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: AMQ 7.10.0.GA
Affects Version/s: AMQ 7.8.4.GA
Component/s: broker-core, mqtt-protocol
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
GSS Priority:
Target Release:

AMQ 7.10.0.GA
Steps to Reproduce:
Hide

I was able to reproduce this in a local environment by setting up a relatively vanilla 2-node broker cluster (not sure this is strictly necessary). Memory was set to 4G min/ max on both brokers. To see the issue in the logs, enable trace logging for mqtt:

logger.org.apache.activemq.artemis.core.protocol.mqtt.level=TRACE

To reproduce, set up the broker cluster, then extract the attached consumer and producer applications locally.

On the node1 broker (or even both), try modifying the network to introduce some packet delay and loss (as root):

[root@node1 ~]# tc qdisc add dev eth0 root netem delay 600ms 200ms loss 5% 25% distribution normal

This may not be strictly necessary as I was able to reproduce the issue once or twice without it, but reproduction was more consistent with it. If possible, use a multi-homed host for this, so the broker can be configured to use the modified interface, while ssh / scp can use the other interface.

1. Modify the consumer jndi.properties to update the remote.address property to correspond to the node2 broker in the cluster and start the consumer.

2. modify the producer jndi.properties to update the remote.address property to correspond to the node1 broker in the cluster and start the producer.

3. Wait until the producer seems to stop producing messages. If the producer exits cleanly, it probably means the issue didn't reproduce.

4. Check the producer log using grep to check for equal / unequal PUBLISH vs. PUBREC events:

cat ../log/artemis.log | grep PUBLISH | grep \ IN\ \<\< | wc -l cat ../log/artemis.log | grep PUBREC | grep \ OUT\ \>\> | wc -l

If these numbers are unequal, there should be a hung thread in the producer application. If the broker is restarted, the producer should resume production and finish the batch.
Show
I was able to reproduce this in a local environment by setting up a relatively vanilla 2-node broker cluster (not sure this is strictly necessary). Memory was set to 4G min/ max on both brokers. To see the issue in the logs, enable trace logging for mqtt: logger.org.apache.activemq.artemis.core.protocol.mqtt.level=TRACE To reproduce, set up the broker cluster, then extract the attached consumer and producer applications locally. On the node1 broker (or even both), try modifying the network to introduce some packet delay and loss (as root): [root@node1 ~]# tc qdisc add dev eth0 root netem delay 600ms 200ms loss 5% 25% distribution normal This may not be strictly necessary as I was able to reproduce the issue once or twice without it, but reproduction was more consistent with it. If possible, use a multi-homed host for this, so the broker can be configured to use the modified interface, while ssh / scp can use the other interface. 1. Modify the consumer jndi.properties to update the remote.address property to correspond to the node2 broker in the cluster and start the consumer. 2. modify the producer jndi.properties to update the remote.address property to correspond to the node1 broker in the cluster and start the producer. 3. Wait until the producer seems to stop producing messages. If the producer exits cleanly, it probably means the issue didn't reproduce. 4. Check the producer log using grep to check for equal / unequal PUBLISH vs. PUBREC events: cat ../log/artemis.log | grep PUBLISH | grep \ IN\ \<\< | wc -l cat ../log/artemis.log | grep PUBREC | grep \ OUT\ \>\> | wc -l If these numbers are unequal, there should be a hung thread in the producer application. If the broker is restarted, the producer should resume production and finish the batch.

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

When publishing messages to the broker from an external client (.Net MQTT), with QoS 2, sometimes the broker fails to acknowledge the message with a PUBREC, resulting in a timeout / resend and causing broken SLAs. After enabling TRACE logging for org.apache.activemq.artemis.core.protocol.mqtt we observed that the number of PUBRECs logged is also lower than the number of inbound PUBLISH events logged:

cat artemis.log | grep -a PUBLISH | grep -a \ IN\  | wc -l
cat artemis.log | grep -a PUBREC | grep -a \ OUT\  | wc -l

So this appears to be a result of the broker failing to ack, rather than a dropped packet.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

artemis-mqtt-protocol-2.16.0.jar
67 kB
2022/04/14 5:59 PM
broker.xml
10 kB
2022/03/31 10:41 PM
threaded-consumer-1.0-SNAPSHOT-bin.zip
6.82 MB
2022/03/31 10:23 PM
threaded-producer-1.0-SNAPSHOT-bin.zip
852 kB
2022/03/31 10:23 PM

duplicates

ENTMQBR-6573 Send PUBREC on duplicate PUBLISH for MQTT QoS 2

Closed

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates