-
Bug
-
Resolution: Done
-
Blocker
-
7.1.0.ER3
-
None
There is regression against EAP 7.1.0.ER2 and EAP 7.0.
Customer story
There can be huge message loss when core bridge redistribution is used. Messages get lost in Artemis cluster. This is severe issue which does not allow use of clustering in production.
Topology
- node1 and node2 - dedicated live-backup pair
- node3 - with core bridge resending messages from InQueue on node3 to OutQueue on node1
Some messages are not delivered to queue on node1/node2.
Trace logs of bridge server node3 looks like the message is successfully distributed over the bridge and acked.
11:50:16,182 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-0 (ActiveMQ-client-netty-threads)) BridgeImpl::sendAcknowledged received confirmation for message LargeServerMessage[messageID=601,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1197599459 11:50:16,182 TRACE [org.apache.activemq.artemis.core.server.cluster.impl.BridgeImpl] (Thread-0 (ActiveMQ-client-netty-threads)) BridgeImpl::sendAcknowledged bridge BridgeImpl@3567a236 [name=my-bridge, queue=QueueImpl[name=jms.queue.InQueue, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=ded6b129-7607-11e7-b35d-001b217d6d57]]@11f0ea44 targetConnector=ServerLocatorImpl (identity=Bridge my-bridge) [initialConnectors=[TransportConfiguration(name=core-bridge-connector-0, factory=org-apache-activemq-artemis-core-remoting-impl-netty-NettyConnectorFactory) ?port=5445&localAddress=127-0-0-1&useNio=true&host=127-0-0-1&useNioGlobalWorkerPool=true], discoveryGroupConfiguration=null]] Acking Reference[601]:RELIABLE:LargeServerMessage[messageID=601,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1197599459 on queue QueueImpl[name=jms.queue.InQueue, postOffice=PostOfficeImpl [server=ActiveMQServerImpl::serverUUID=ded6b129-7607-11e7-b35d-001b217d6d57]]@11f0ea44
However, searching for AMQ_DUPL_ID property of given message shows single trace log entry on node1
11:50:16,146 TRACE [org.apache.activemq.artemis.core.server.impl.ServerSessionImpl] (Thread-11 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$3@7d0731a8)) sendLarge::LargeServerMessage[messageID=665,durable=true,userID=f0f81ccc-7607-11e7-804f-001b217d6d57,priority=4, timestamp=Mon Jul 31 11:50:13 EDT 2017,expiration=0, durable=true, address=jms.queue.OutQueue,properties=TypedProperties[__AMQ_CID=ef7ceddc-7607-11e7-804f-001b217d6d57,count=157,color=RED,_AMQ_BRIDGE_DUP=[DED6 B129 7607 11E7 B35D 001B 217D 6D57 0000 0000 0000 0259),counter=158,_AMQ_DUPL_ID=11976d90-f23b-4f1e-867e-c12b0573a2fe1501516213773,_AMQ_LARGE_SIZE=409617]]@1299746455
As a result, the message is missing on target server's queue.
Notes
- This is intermittent fail
- Reproducer test is focused on testing failover, however message loss happens before live node1 is killed
Issue is under investigation
- is incorporated by
-
JBEAP-12695 Upgrade Artemis 1.5.5.jbossorg-007
- Closed
- relates to
-
JBEAP-12566 Unresponsive broker caused by synchronization issue in QueueImpl
- Closed